


THE ANNALS 


STATISTICS © 





Vol. $1, No. 4 — December, 1900 


THE ANNALS 
OF MATHEMATICAL STATISTICS 


a oer ee eee en een ne ne Se eee 


Editorial Ofiee, Department of Statistics, Eckhart Hall, University of Chicago, Chi- 
cago 37, Il'inois. William Kruskal, Editor. 
of Manuscripts should be submitted to the editorial 


papers 
legible. Footnotes should be reduced to a minimum, and where possible replaced by 
remarks in the text, or a bibliography at the end of the paper; formulae in footnotes should 
be avoided. References should follow current Annals style, and should be numbered alpha- 


kappa and k, mu and u, nu and v, eta and n, ete. Subscripts or 

clearly below or above the line. Bars above groups of letters (e.g., 

letters (e.g., x) are difficult to print and should be avoided. 

italicised by the printer and should not be underlined on manuscripts. Boldface letters may 
be indicated by underlining with a wavy line on the manuscript; 


solidus; thus (a + 6)/(c + d) rather than “~~ 


Authors will ordinarily receive only galley proofs. Fifty reprints without covers will be 
furnished free. Additional reprints and covers will be furnished at cost. 


sean, Aaa f Methemaiee’ Setistine ete addressed to either the Editor or the Treasurer, as de- 


Composep AND PRINTED aT THB 
WAVERLY PRESS, Inc., Bautiwonm, Manrianp, U. 8. A. 
Second -class postage paid at Baltimore, Maryland 








The Twenty-Fifth Anniversary of 
the Founding of the Institute 
of Mathematical Statistics 


The Institute of Mathematical Statistics was estab- 
lished and organized on September 12, 1935, in a class- 
room at the University of Michigan in Ann Arbor. In 
1960, the Institute has completed twenty-five years of 
existence and reached its Silver Anniversary. 


To commemorate this Anniversary and to describe 
the Institute’s origins to its present membership, Allen 
T. Craig, who was among those present at the Institute’s 
founding and who served as the first Secretary-Treasurer 
of the Institute, was invited to prepare an account of the 
early days of the Institute and of The Annals of Mathe- 
matical Statistics. Mr. Craig’s article appears on the fol- 
lowing pages. 


EDITOR 








REFERENCES TO THE ESTABLISHMENT OF 
THE INSTITUTE OF MATHEMATICAL STATISTICS 
AND THE ANNALS OF MATHEMATICAL STATISTICS 


Annals of Mathematical Statistics 


Vol. 1 (1930), pp. 1-2. Initial statement about publication of the Annals. 
Vol. 6 (1935), ee Initial statement about formation of the Institute. 


Vol. 9 (1938), Second unnumbered page after p. 67. 


Journal of the American Statistical Association 
Vol. 25 (1930), p. 184 (Proceedings). 
Vol. 30 (1935), pp. 710-11. Formation of the Institute and program at the organizing 
meeting. 








EDITORIAL STAFF 


Eprtror 
WILLIAM KRUSKAL 


AssociaTe Epitors 


ALLAN BIRNBAUM DONALD A. DARLING OSCAR KEMPTHORNE 

DOUGLAS G. CHAPMAN WASSILY HOEFFDING E. L. LEHMANN 

W. S. CONNOR N. L. JOHNSON DAVID L. WALLACE 

WITH THE COOPERATION OF 

J. R. Brum J. F. Daty Harry Kesten J. W. Pratr 

R. C. Bosse Cyrus DerMaNn C. H. Krarr Howarp Ratrra 

D. L. Burxnotper J. L. Doos Sotomon Evutipack H. E. Ropsins 

W. S. Connor Meyer Dwass Everne Luxcacs Wa vrter L. Smiru 

D. R. Cox D. A. 8. Fraser G. E. Norerner Lionget Wetss 
Samvuet Karin InGRaM OLKIN 


Past Epitors or THE ANNALS 


H. C. Carver, 1930-1938 T. W. ANpERson, 1950-1952 
8S. S. Wrixs, 1938-1949 E. L. Leumann, 1953-1955 
T. E. Harris, 1955-1958 


Published quarterly by the Institute of Mathematical Statistics in March, 
June, September and December. 


IMS INSTITUTIONAL MEMBERS 


ABERDEEN Provine Grounps, Batustic Researcu Laporatorigs, Aberdeen, Maryland 

AgrosetT-GENERAL Corporation, P. O. Box 296, Azusa, California 

AmerIcAN Viscose Corporation, Marcus Hook, Pennsylvania 

ATLANTIC ReFininG Company, 2700 Passyunk Avenue, Philadelphia, Pa. 

en” Sees LaporaToriges, Inc., TecHNnicaL Liprary, 463 West Street, New York 14, 

ew itor 

Benpix Aviation Corporation, 1200 Fisher Bldg., Detroit, Michigan 

BogineG ArrPLaNne Company, Box 3707, Seattle, Washington 

Catirornia Researcu Corporation, P. O. Box 1627, Richmond, California 

Case Institute or Tecuno.oey, Statistica, Laporatory, Cleveland 6, Ohio 

Catuo.iic UNiversity or America, StatistTicaL LaBoraTory, MaTHEeMaTics DEPARTMENT, 
Washington, D. C. 

C-E-LR, Inc., 1200 Jefferson Davis Highway, Arlington 2, Virginia 

Cotumsia University, DEPARTMENT OF THEMATICAL Statistics, New York 27, N. Y. 

Cornett University, Matuematics Department, Ithaca, New York 

Forp Moror Company, P. O. Box 2053, Dearborn, ee 

GenERAL E.ecrric Company, ve C37, Room 248, Schenectady, New York 

Inprana University, Tae Liprary, Bloomington, Indiana 

INTERNATIONAL Business Macuines Corporation, MATHEMATICS AND AppLiep Science 
Liprarky, 1271 Avenue of the Americas, New York 20, N. Y. 

Iowa Strate University, Statistica, Laporatory, Ames, Iowa 

LockHeep ArrcrarT CORPORATION, ENGINEERING Liprary Burbank, California 

Micaiean State University, DEPARTMENT OF STATISTICS, East Lansing, Michigan 

Minnesota Mintinc anp Manvuracrurtnc Company, Apptigep MaTHEeMaTics aNnp Sra- 
tistics, St. Paul, Minnesota 

Monsanto Cuemicat Company, 800 North Lindbergh Blvd., St. Louis 66, Missouri 

NATIONAL bag Reoister Company, Researcu DerartmMent, Main and K Streets, Day- 
ton 9, Ohio 

Nationa. Security Acency, Fort George G. Meade, Maryland 

NORTHWESTERN UNIVERSITY, DEPARTMENT OF MaTHeMaTics, Evanston, Illinois 

Princeton University, DerparTMeNT OF MatTuematics, SecTion OF MATHEMATICAL 
Statistics, Princeton, New Jersey 

Purpvus University Lipraries, Lafayette, Indiana 

Rapro Corporation or America, R.C.A. Lanornatories Liprary, Princeton, New Jersey 

Ramo-Woo.aipce Corporation, Los Angeles, California 


(Continued on next page) 











Reminoton Ranp—Univac Division, 315 Park Avenue South, New York 10, N. Y. 

Sanp1a Corporation, Sandia Base, Albuquerque, New Mexico 

Socony Mosit O11 Company, Inc., 150 E. 42nd Street, New York 17, New York 

Sournern Mersopist University, MatHematics DeparTMENT, Dallas 5, Texas 

Space Tecuno.ocy Lasoratorigs, P. O. Box 95001, Los Angeles 45, California 

Stranrorp University, Girsuick Memoria Liprary, Stanford, California 

State University or Iowa, Iowa City, Iowa 

Union Carsipe Corporation, 30 East 42nd Street, New York 17, New York 

Union Orn Company or Cauirornia, Unton Resgarcu Center, Box 76, Brea, California 

Unrrep States Steet Corporation Lisprary, Monroeville, Penna. 

University OF CaALIroRNiA, StTatisticaAL LaBoratory, Berkeley, California 

University or ILuiNois, Sertats Department, Urbana, Illinois 

University or Nortsa Caro.ina, DEPARTMENT OF Statistics, Chapel Hill, North Carolina 

University oF Puerto Rico, Scuoot or Tropica, Mepicing, San Juan, Puerto Rico 

UNIVERSITY OF WASHINGTON, LABORATORY OF StTaTisTIcaAL Researcu, Seattle, Washington 

W. R. Grace anp Company, Researcu Division, Washington Research Center, Clarks- 
ville, Maryland 

W. R. Grace anv Company, Dewey anv ALmy Cuemicat Division, 62 Whittemore Avenue, 
Cambridge 40, Massachusetts 








OUR SILVER ANNIVERSARY 


By ALuen T. Craic 
University of Iowa 


Prior to 1920, a scant half-dozen American colleges and universities had, asa 
member of the department of mathematics, anyone who was seriously interested 
in a newly developing method of scientific inference called Mathematical Statis- 
tics. In the decade that followed, spurred perhaps in part by the first World War, 
there was a marked increase in the number of graduate students of mathematics 
who found mathematical statistics to be a challenging and rewarding field of 
study. But the problem of publication was quite acute. On the one hand, the 
relatively large American Statistical Association was, at that time, quite effec- 
tively dominated by persons who vigorously objected to having their Journal 
cluttered up with a lot of meaningless symbols. On the other hand, the august 
American Mathematical Society took a very dim view of the whole business 
and looked upon these mavericks with a suspicion of quackery. Although most 
mathematical statisticians were members of both of these societies, it was fairly 
clear that access to the publications of these societies was too restrictive to 
represent a healthy situation. In a rare and generous move, Harry C. Carver 
founded and personally financed a new journal that he named The Annals of 
Mathematical Statistics. Volume One appeared in 1930. In fairness to the Ameri- 
can Statistical Association, it should be remarked that a few years later the 
Annals became affiliated with that Society. 

By 1934 there was a group of reasonable size (made up of persons in govern- 
ment, in industry, and in the colleges and universities) that felt the interests of 
mathematical statistics could better be served if we had a society and a journal 
of our own. The Editor volunteered to make his Annals the official journal of 
such a society and in fact to turn over the publication of the Annals to the new 
society as soon as it was able to carry the burden. Preliminary conversations and 
correspondence concerning the organization of a society of mathematical statis- 
tics soon showed that people were far from unanimous as to what should be the 
nature of the organization. Some thought of a statistician of that day as a 
specialist much like an actuary; and accordingly it was urged that membership 
in the society should be graded and should be awarded on the basis of written 
examinations. A survey revealed the obvious: that practically everyone would 
be willing to give examinations but virtually no one would take them. A com- 
promise was worked out whereby the general plan of organization of the new 
society would be along the lines of the Mathematical Society but there would be 
two grades of membership, namely, Members and Fellows. Thus, on September 
12, 1935, at Ann Arbor a constitution and by-laws were adopted and The Insti- 
tute of Mathematical Statistics was formally organized. Henry Lewis Rietz was 
chosen to be the first president. 


Received July 9, 1960. 
835 








836 ALLEN T. CRAIG 


I have often regretted our yielding to the notion of having two grades of 
membership. When one scans the current Directory and finds listed as Members 
many highly talented persons, he too must share some mis-givings about the 
fairness and the wisdom of these designations. 

During these years the Mathematical Society was gradually mellowing and it 
was possible for the Institute to hold its first post-organizational meeting jointly 
with that Society on January 2, 1936, in St. Louis. The meeting consisted of 
exactly one session at which four contributed papers were read and one invited 
address was given. Perhaps the most significant feature of the meeting was the 
decision to request the Society to meet jointly again with the Institute at Cam- 
bridge in September of 1936. 

It is my opinion that the Cambridge meeting aided materially in the rapid 
development of the Institute. Held as it was in connection with the Harvard 
Tercentenary Celebration, this meeting provided an opportunity for the Institute 
to bring itself and its purposes to the attention of an audience with broad and 
varied interests. The acceptance, by other societies, of the Institute as a scholarly 
society itself was certainly accelerated by this meeting. 

With stability assured, the Institute could now employ the energy and talent 
of its members to attack some of the problems of the day. A particularly acute 
problem at that time was the state of the teaching of statistics in the colleges 
and universities of the United States. Under the able and agressive leadership 
of one of our members, constructive suggestions on the improvement of the 
teaching of statistics were formulated, were officially endorsed by the Institute, 
and were given wide circulation. The Institute should be credited with having 
focused academic attention on this problem. To be sure, other forces were at 
work at the same time, but the action of the Institute seems to have provided 
the initial jolt. In any event, the teaching of statistics has shown steady improve- 
ment and, after all, that is what the Institute wanted. 

Meanwhile the second World War was under way in Europe and in this coun- 
try one heard a great deal about national defense. In the summer of 1940 the 
Institute appointed a war preparedness committee. The chairman of this com- 
mittee prepared an excellent report on where and how mathematical statisticians 
should be used in preparing for or prosecuting a war in that era. This report was 
brought to the attention of our government agencies. Perhaps some of you were 
as favorably impressed as I was with the close parallel between the actual utiliza- 
tion (in general) of mathematical statisticians during the second World War and 
the recommendations of the Institute. 

In 1946, the President of the Institute announced that the first Rietz Memorial 
Lecture would be delivered at the Yale meeting in September, 1947. The lec- 
turer was Abraham Wald and the subject was “Sequential Estimation and 
Multi-Decisions.” It was a tragic and cruel twist of fate that a few years later 
the name of the first Rietz lecturer should be used to establish a second series 
of memorial lectures. 

The events of the past decade are so recently in mind that I shall not dwell 








OUR SILVER ANNIVERSARY 837 


upon them. The Institute of Mathematical Statistics, now a powerful and 
influential force in science in the United States, still holds that support and 
encouragement of teaching and research in mathematical statistics are its primary 
aims. However, many members of the Institute not only have continued their 
support of the numerous applications of statistics that they formerly supported 
but in recent years others have actually enlarged the scope of that support. This, 
I think, is particularly true in the social and behavioral sciences. 

Of the many achievements of the Institute, the one that is to me the greatest 
source of pride is the change in the stature of its journal, the Annals. In 1935 the 
list of subscribers to this journal consisted of the names of 98 libraries and 118 
individuals. In accordance with the 1934-35 agreement, the Institute took over 
the publication of the Annals beginning with the June, 1938 issue. The first 
editor appointed by the Institute served from that time through December, 
1949, and the present editor is the fifth to be appointed. An editor of a scientific 
journal bears a heavy responsibility. He can make or break a journal; and as a 
journal goes, so goes the organization that supports it. 

Some manuscripts present such important new results, viewpoints, or solu- 
tions of outstanding problems that they do not present difficult editorial de- 
cisions. But most manuscripts make smaller contributions-they complete the 
development of a theory or point out interesting facts that have been over- 
looked. All kinds of manuscripts are important in that they represent a continued 
intellectual interest by those who are responsible for the teaching of statistics, 
the training of future mathematical statisticians, and the carrying out of statis 
tical research. There are few manuscripts that present exceptionally important 
new results, solutions, and viewpoints, while there are many that present less 
outstanding material. Yet I firmly believe that only editors who appreciate the 
importance of a constant flow of all kinds of contributions can build great jour- 
nals that will remain great. Our editors have measured up to this. And on this 
twenty-fifth anniversary of the founding of the Institute of Mathematical Sta- 
tistics, I offer to each of our five editors our sincere thanks for the effort he has 
put forth to make the Annals, and inferentially the Institute, incomparable. 








SIMPLEX-SUM DESIGNS: A CLASS OF SECOND ORDER ROTATABLE 
DESIGNS DERIVABLE FROM THOSE OF FIRST ORDER’ 


By G. E. P. Box anp D. W. BexnkEeNn 
University of Wisconsin and American Cyanamid Company 
1.0 Introduction. A functional relationship 7 = g(t, &,---, &) = g(€) is 
assumed to exist between a response 7 and k continuous variables & , & , --~ , & . 


To elucidate certain aspects of this relationship measurements of 9 are to be 
made for each of N combinations of the levels of the variables 


G - (iu , Eau, o2* 5 Eee) 


The problem of experimental design considered is the choice of the design matrix 
D of N rows and k columns whose uth row is £., which specifies the levels of the 
variables to be used in each of the N trials. The design matrix can be regarded 
as specifying the coordinates of N experimental points in the k dimensional space 
of the variables. As mentioned for example in [1], [2], [3], and [4] a number of 
distinct problems can arise. Here we suppose as in [5] that the nature of the func- 
tional relationship g(£) is unknown but that over a specific region FR in the space 
of the variables & , & , --- , & @ polynomial of degree d, f,(&), adequately grad- 
uates the function g(&) and the objective is to use the polynomial to estimate 7 
within the region R. A design of order d is such that it allows the estimation of 
the polynomial f4(&). In this paper we shall be particularly concerned with the 
case of d = 2, that is with the fitting of a polynomial of second degree. Using 
specifically what is called a rotatable design, we shall develop a method of ob- 
taining rotatable designs of second order from those of first order. In defining 
rotatable designs it may be appropriate here to discuss briefly why they are 
thought to be useful. 

A general design may be expressed in terms of standardized variables, for which 


N 
> tw = 0, é=1,2,-:-,k 


u=l 


and 
N“>> zi, = ds, 


where ); is a convenient constant. In actual application therefore the levels of the 
experimental variables £; are given by &. = Siti + & where &» and S; were 
suitably chosen so as to give appropriate location and spread to the design in the 


Received November 7, 1958; revised April 20, 1960. 
1 Prepared in part at the Statistical Techniques Research Group, Princeton University, 
Princeton, under the Office of Ordnance Research, Contract No. DA-36-034-ORD 2297 and 


in part at the Department of Experimental Statistics, North Carolina State College, 
Raleigh. 


838 








SIMPLEX-SUM DESIGNS 839 


particular application. We shall suppose that the functional relationship is to be 
estimated by standard least squares. 


A polynomial of degree one in the z’s may be written 
é 
Ne = } By Liu 
t=) 


where x, = 1,u = 1, 2, ---, N or in matrix notation » = X§ where X is a 
N X (k + 1) matrix 
X = [1:D) 


and 1 is an N x 1 column vector with all its elements unity. Whether one’s ob- 
jective is to obtain minimum variance for the estimated linear coefficients, 
minimum volume of the confidence region for the coefficients, or minimum vol- 
ume of the confidence cone for the direction of steepest ascent, one is led to the 
simple conclusion that the most desirable design is orthogonal, that is, it is such 
that X’X = NA where A is a diagonal matrix with its first diagonal element 
equal to unity and its remaining diagonal elements equal to d, . 

Often we are not particularly interested in estimating the individual coeffi- 
cients 8; but in estimating the polynomial itself. Suppose a design has been car- 
ried out which allows us to fit the polynomial by least squares. Using the fitted 
polynomial the estimated response at the conditions x’ = [z; , 22, --+ , 2x) is de- 
noted by g, . If a polynomial of the degree assumed can exactly represent g(x) 
then 

E (x) 7 
and a measure of the accuracy of our estimation over the region of interest FR is 
provided by V(9z). 

It is easy to show that a first order orthogonal design has the property that 
V(x) is a function of x’x = >> zi and ), alone. 

V (Gx) = o(x’x, ds) 

For such a design therefore, this variance (and hence the reciprocal of the vari- 
ance which can be regarded as a measure of the information supplied by the de- 
sign about the response surface) is constant on circles, spheres or hyperspheres 
in the factor space, i.e., in the space of the variables x; , 72, --- , 2. Designs 
which have the property of generating spherical variance contours are called 
rotatable designs. It is easily shown for first order designs that the converse 
proposition is true, that is in order to insure rotatability the design must be 
orthogonal. As is pointed out in [5] the criterion of orthogonality, which has a 
central place for the first order design, is not readily extendable to designs of 
higher order. We can, however, readily extend the property of rotatability to de- 
signs of higher order and it is found that in general for a design of order d it is 
possible to choose a design such that 


V(9) -_ o(x’x, Ae , NM g EA Ao) 


where A; are constants at our choice. 











840 G. E. P. BOX AND D. W. BEHNKEN 


To ensure the design is of this form it is only necessary to arrange that the 
moments of the design up to order 2d shall have certain values. For the case of 
second order designs with which we are specifically concerned 


V(%) = g(x’x, Ae 9 Ay) 


where \, and ), are at our choice. d2 is merely a scaling factor while \, is chosen 
to give a satisfactory variance profile along a radius vector. 

The problems for which the designs we are discussing have particular applica- 
tion are those where we are gaining knowledge of certain features of an unknown 
functional relationship by a sequential process in which any one “‘design”’ is only 
a single step. The results of each such step are used to more effectively plan the 
next group of observations. 

At a particular stage we are interested in the behavior of the response function 
“in the neighborhood” RF of some particular point ?. We have in mind that the 
operability region O, that is the region in the space of the variables in which ex- 
periments could be conducted, is fairly extensive and that P is not close to the 
boundary of O. We suppose that the neighborhood of interest about P is a region 
R which nowhere reaches the boundary of O and that scales, metrics and trans- 
formations are chosen either implicitly or explicitly such that R is very approxi- 
mately spherical and is centered at P. 

The science of designing experiments is principally a convenient way of giving 
expression to prior information about the experimental situation which is cur- 
rently in the experimenters mind and utilizing this information so as to gener- 
ate further information most likely to be of value. The prior information is ex- 
pressed in the choice of metrics, scales and transformations employed and is 
based on the experimenter’s current feelings concerning the nature of the funce- 
tion under study. To the extent that the choices are poor, the extra information 
obtained about the nature of the function after the next set of observations have 
been completed, will be less than might otherwise have been obtained. This 
would mean that a sequence of such experiments, in which the information 
gained at each stage is utilized to design further more effective experiments, 
would be somewhat longer when prior information was less. This of course is to 
be expected and is a reflection of the fact that the apparent indeterminacy is a 
property of the experimental problem of exploring unknown functions itself, 
rather than of a particular technique for solving it. To demonstrate that some 
such rationale as the above is necessary one should remember that any set of 
experimental points distributed through the factor space such that X is of rank 
k + 1 provides a first order orthogonal design in some set of transformed z’s. 

The discussion so far has been based on the nature of the variance function 
V(gx) = Elg, — E(%x)\*. In practice it would seldom if ever be true that the 
polynomial would provide an exact representation of the unknown function and 
in a more recent paper [2] this assumption has been dropped. Designs which mini- 
mize the mean square error E(j, — x)° are considered instead. Now 


E(G. — 2x)" = V(Gz) + [E(gx) — ml 








SIMPLEX-SUM DESIGNS 841 


where the additional term on the right hand side may be called the squared bias. 
A general theorem in the above paper shows that if we are fitting a polynomial 
of degree d; over a region R when a polynomial of higher degree d, is necessary 
to give an exact representation, then the value of the squared bias averaged over 
the region, is minimized when the moments of the design points are the same as 
those of a uniform distribution over the region R. If it seems plausible in ac- 
cordance with the previous discussion that the region of interest should be re- 
garded as spherical then the optimum design to minimize average bias is also a 
particular rotatable design. 


2.0. Outline. If we accept then that rotatable designs are of interest it be- 
comes necessary to discover how they may be obtained in practice. First order 
rotatable designs are readily obtained (they are simply the orthogonal designs) 
but useful second order designs are less easy to derive. The method used here for 
obtaining second order rotatable designs from those of first order will now be 
outlined. In what follows we shall use n for the number of points in a first order 
design, and N for the size of a general or higher order derived design. 

In the fitted first degree equation there are k + 1 constants, consequently at 


K2 





Fic. 1b. Generated second order rotatable design for two factors 





G. E. P. BOX AND D. W. BEHNKEN 


Xo 


Fic. 2b. Generated second order rotatable design for three factors 


least k + 1 observations must be made if the constants are to be separately 
estimable. Suppose the first order orthogonal (rotatable) design is used with the 
minimum number (n = k + 1) of experimental points. Then it is easily shown 
[6] that in the space of the x’s these points lie at the vertices of a regular simplex. 
For example, if k = 2 the points are at the vertices of an equilateral triangle, if 
k = 3 at the corners of a regular tetrahedron. They can thus be called first order 
simplex designs. Now it can be observed that certain of the useful second order 
designs which have been found bear an interesting relation (illustrated for k = 2 
and k = 3 in Figures | and 2) to the first order simplex designs. The three points 
at the vertices of an equilateral triangle in Figure 1a when joined to the origin at 
the center of the triangle, define three vectors. By adding these vectors two at a 
time we obtain a second equilateral triangle; by adding the vectors three at a 
time we obtain a center point. The original set of points plus the derived points 
generate the design shown in Figure 1b. This is the so-called hexagonal design 
which is known to be a second order rotatable design, [5]. The corresponding 
four vectors from the origin to the vertices of a tetrahedron shown in Figure 2a 
(a first order orthogonal or rotatable design) when added in all possible ways two 
at a time generate six further vectors passing through the midpoints of the 
edges of the tetrahedron, when added in all possible ways three at a time generate 
four vectors passing through the mid points of the faces of the original tetra- 








SIMPLEX-SUM DESIGNS 843 


hedron and when added four at a time generate a center point. If the lengths of 
the derived vectors are suitably chosen the resulting design coincides precisely 
with a previously derived second order rotatable design, namely the central 
composite rotatable design [5], [7]. These derived designs will be called simplex- 
sum designs. 

In this paper we first demonstrate that the method suggested by these two 
examples for generating simplex-sum second order rotatable designs containing 
2” — 1 points, from the first order rotatable simplex design containing k + 1 
points, is a general one. For k 2 5 the number of points required by this method 
becomes large compared to the number of constants to be determined. A method 
is given for generating “fractions” and “replicated fractions” of the derived 
designs which have all the required properties and hence overcome this difficulty. 
Finally it is shown how the designs may be arranged in blocks so that they may 
be utilized in circumstances where insufficient homogeneous experimental 
material is available to complete the full quota of experimental runs. 

To illustrate the method a second order rotatable design for seven variables 
is obtained, requiring only 66 experimental runs and only using three levels of 
each variable. 


3.0. General Theory. 
3.1. Conditions for Rotatability. We now define the design matrix D for the k 
standardized factors 7; , 22, --: , 2% asan N x k matrix whose uth row 


x. = (210 Lou *** Dew) 


defines the coded factor levels to be used in the uth of N experiments called for 
by the design. The general moment of the design will be denoted by 


N 
[172% .-- k™*] = Nn” > Zicaet ++ at. 
ua) 


and a = >> a; will be called the order of the moment. The problem of finding 
rotatable designs is in essence one of finding configurations of points possessing 
the proper moments. It is in fact shown in [5] that when fitting the model 


k k k k k k 
n = Bo t+ 2, Bix: + > > By zs; + DDD, By ttt + °° 


tml jumd tml ject Lenj 


including all terms through degree d, a rotatable design will be obtained when the 
moments through order 2d are of the form 


7 
I] (a)! 
1. —— 


[1%"2** — k**] _ d ee Il (ay 2)! 


: all a; even 
(3.1) 


lo, any a, odd, 


where \, is a constant for any design and a. 











844 G. E. P. BOX AND D. W. BEHNKEN 


3.2. Notation and Definition. Consider a first order orthogonal (simplex) de- 
sign in k = n — 1 variables with design matrix D, and with >-?_., 2, set equal ton 
so that 


[| (1D) = nh. 


It is conjectured that a second order rotatable design may be generated by using 
as design points the vectors obtained by taking all possible sums of the n rows of 


, 
Xi 
, 
X2 


D, = “ys 
Xu 


Xn 
taken s at a time where s = 1, 2, --- , k. The problem thus reduces to one of 
finding the moments of a design matrix D derived in this way. We allow the 
vectors obtained by taking sums of s rows to be multiplied by a constant a, = 0. 
The constants a, , a2, «+ , a will be called radius multipliers. Then the N by k 
matrix D for the derived design is 


ay D, 
a2 D: 


a,D, |’ 
ay Dz 


where N = 2” — 2. Each D, is an (") by k matrix whose rows consist of all 


possible sums of the rows of D, taken s at a time. We omit for the moment the 
center point corresponding to D,,; obtained when all n vectors are added to- 
gether simultaneously. Since the columns of D, are orthogonal to a vector of ones 
it follows that each vector obtained by summing rows s at a time is the negative 
of one obtained by summing rows n — s at a time. The points in the factor space 
represented by D, are, therefore, reflections through the origin of those repre- 
sented by D,_, . (Of course, when n is even, n — n/2 = n/2 and half the rows of 
D,,/2 are reflections of the other half.) 


Let us define the moment component [1*', 2", --- , k**], as (") N™ times the 
specified moment of D, , i-e., 


is 25 "h = 4 > > Tee = (tw, + Tiusg + +os + i) 


1gui<ug <u,s" 


* (eu, + Tou, + vt + Siu, )™ se (Liew, + Thug + —— + Mu)”. 








SIMPLEX-SUM DESIGNS 845 
Then the corresponding moment for the entire design can be written 
k 
(iva... k**] = >> (a,) [12 --- k*],. 
s=l 


3.3. Analogy to Sampling from a Finite Population. The problem of finding 
expressions for [1°'2%* --- k“*], in terms of either the moments of D, or of 
[i"2"* --- k™*], corresponds to that of finding the sampling moments of means 
(or totals) of samples of s drawn from a k-variate finite population of n elements. 
These moments can be derived by a method due to Tukey [8] and elaborated by 
Wishart [9], Hooke [10] and Robson [11]. The necessary derivations are given in 
{12}, [13] and the results utilized for our purposes here. 

The sampling analogy is readily seen if we consider a k-variate vector of means 


obtained by averaging a random sample of s k-variate vectors chosen from a pop- 
ulation of n such vectors, 


ae = ix 
(4:1, Z2, +++, FE) = = 2s (ten, Sens ***, Deu). 


u=l 
Then the joint sampling moments of these multivariate means are 
Ave {Z{'#z" --- #"}, 


Ave denoting the average value of the indicated power product over all combina- 
tions of samples of s. These expressions can be used to obtain the moment com- 
ponents of any submatrix D, since 


[12° --- k**], = N's" (") Ave {£1'8:* --+ £*}. 


Using this equality and the results in [12] (recalling 2 tia ty. = ni = 
1, 2, --» , & here) the expressions for the required moment components are 
readily obtained and are shown in Table la for n 2 a. Table 1b gives the one 
case required here for n < a not covered by Table la. 

3.4. Form of Moment Components. Tables 1a and 1b show the moment com- 
ponents in terms of a notation designed to simplify their use and to make clear 
their general pattern. Certain of the coefficients C(s) have single subscripts 
while the remainder have double subscripts. The former are not multiplied by 
unrestricted moment components of D, and hence are constant terms in the 
moment component equations for a given n and s. The latter however, are multi- 
plied by D, moment components such as [i j*]; , or combinations of D, moment 
components and are therefore coefficients of quantities which will not in general 
be constant for different choices of i, j, k, l, m. 

The values taken on by any coefficient function C(s), when n is held constant, 
possess a symmetry with respect to s as a result of the reflection relationship be- 
tween vectors in D, and D,_, . Since one matrix is the negative of the other their 
respective components must differ only by the factor (—1)* or 


[12% --- kM, m= (—1) 12 --- kM, 








(2-*\f___ @-»6-» yore | 
aS b — 89 + usg — ug + ,u 
N(e-8 


ee \o —u 


"{e | elt] (8) "9% + 
"Z| e%efet)(9)"9 + "leveet](#)"9 + (8)*9 *[e¥e/et] 


= (8)°9 


Yet] = UZ] 92] | "Z| o2](8)**901 + "Ie2](8) "9 a | 


({%-*\ &-% | 

19] = Iz | atyet) \ Gsliacaete | 1 | st¥f2)(#)**D + "Leryr)(#)'*9 “(eel 
1-8 (6 — ug — He — %) a 

<7 —u) = yle}(9) [apy l 

s-uJlan-de-wee@-ul” u) = (#)'%) [mepy lr} (8) "89 [mepy a} 





'ha(s)"9 + (8) 08 a 


N(t-8 
u\t u ees)" + (9)'9 *[sfe2] 
‘ a 


BEHNKEN 


Yelt}(s)"9 left] 
Tey} (8) [ev] 


I-8 (¢ — uy(z — 4) | 
_ " U oy lt Toy, 
saullicee la oa os (8)"9 (rf) (8)"9 (242) 


w. 


Tet} (8) "> | *[e2] 
Left}(s)"*D | [eft] 


é 
a 
Zz 
- 
* 
° 
= 


P. 


"¥ft}(8)"* 


G. E. 


suoT rr Aasiqqy sv[nu0 y [wader 


i 


(9 = u) ‘"q fo spuauodwoo puawow po.sauab fo fupwung 
Bl WTAVL 








847 


SIMPLEX-SUM DESIGNS 








Nel = "Ie | ot] 
Tel = Z| of] 
Haylee + Lael] yet) + Lee) 'Ge) + levels] = Ie | e%efet] 

Le¥el] + "Le¥et] + left) = [2 | cet] 
z-8 ¢@-“)*-%) 
p—u)| b+ oh + usp — u — Qu 
(¢ — 4b — He — 4) — ¥) 


N = (8)%) 


(b — & — 808 + “IT + use — eugi)(I — 8) u 
8 = OY = ~ OC 8) Et 8). ies] 
(sg — u)(sp — u)(#G — u)(sZ— 4) [\Z— & 


Te | s21(8)" 901 + 





suon tr AqIqqy 


pan u au 0)—8T ATAVL 





Tz | ot)(#)**09T + Te2}(8)"9 + (8)*O91 


sUyNUIO | [e19IUer) 





G. E. P. BOX AND D. W. BEHNKEN 


TABLE 1b 
Fourth order moment components of D, forn = 3, (n < a) 


General Formula Abbreviation 


li. Cals)lirh Oe «(*) [2 — 7(s — 1) 


: ; - s {3 9 
(), Ci(s) +4 Cul(s)(PPh C,(s) =— (s - 1) . 

3!\s N 
(i, 3C,(s) + Cals) 


From Tables 1a and 1b we see that in general 
[12% .-- kK), = bCa(s) + Car(s) (1"%2™ --- k™), 

+ del aa(s)[1%2% «++ kK | Qh + +++ + depCap(s) [12 --- k™* | phi, 
where the b, values are zero or positive constants varying with the particular 
partition of a = (aja --- a). 

It is readily shown in general [13] and can be confirmed by direct substitution 
that Cai(s) = (—1)°Cai(n — 8). 

4.0. Radius Multipliers and Rotatability. Having general formulas for the 


moment components contributed by each submatrix D, of a derived design 


matrix D, we now seek a suitable set of radius multipliers such that the moments 
of D 


(3.4) 


- 
(4.1) (12°... k™*] = a af (12% .--. k™*], 
will fulfill the requirements for rotatability listed under (3.1). 

By even-order moments we mean those for which a = >> a; is even and by odd- 
order moments we mean those for which a is odd. In addition we call those mo- 
ments for which any a; is odd, odd moments and those for which all a; are even, 
even moments. For rotatability all odd moments must be zero and all even 
moments of the same order must be specified multiples of each other. 

From Table 1a we see that the moments [i], [ij] and [:"] of D, will satisfy the 
rotatability requirements for any choice of radius multipliers since the corre- 
sponding odd moment components [i], and [ij], are identically zero and [i], is con- 
stant for all 7. The other moments however all involve “variable terms” 
Cai(s){ }, and only in the case of the even moments is the constant term b.C.(s) 
added to this variable function. The moment requirements will be generally satis- 
fied only if the radius multipliers are so chosen that each “variable term’”’ sums 
to zero in the expression for all [1°'2"* - - - k**]. For odd moments this is obviously 
required. For the even moments it would otherwise be impossible to attain the 
required constant ratio between moments of the same order since the quantities 
{ ], in general change in their relationships, from one moment to another. The 
only further requirement for rotatability is that the constant terms, b.C.(s), are 
in the required ratios. 








SIMPLEX-SUM DESIGNS 849 
Using the general form of {1°'2%* --- k™*], from (3.4) in (4.1) we have 
k 
[12% --- k™] = Do (a,) [12 --- k™], 
, e=l 


k s 
= bE (a4)"Cals) + [12 «++ Re D (a4) "Cau(s) 
+ befl2™ --- k™* | 2h- 2 (a,) °C aa(8) 


k 
+ +e+ + degf1 2" --- k™ | ph: x (a,)"Cap(8), 
where b.C.(s) does not appear unless the moment is even and we require 
k 
Dd (a:)"Cas(s) = 0, i=1,2---p. 
a=) 


Since we have seen previously that C(s) = (—1)°C(n — s), then for all odd- 
order moments C,:(s) = —Cai(n — 8). We can say further, because of the 
factor (n — 2s) in all such odd order moment coefficients, (Table la) that when 
a and k are both odd, Cas(n/2) = Cai(k + 1/2) = 0. Therefore as long as 
radius multipliers are selected such that a, = a,_, all the odd-order moments 
will sum to zero for any value of a, . Setting m = k/2 when k is even andm = 
(k — 1)/2when k is odd it then follows, for such a choice of radius multipliers, 
that 


> (a,)*Ca(s) = > (a,)“[Cas(s) + Cas(n — 8)] = Oforalli, a odd. 


We will call this type of solution for the radius multipliers, where a, = a,_, , & 
symmetric solution. 

Having satisfied the odd-order moment requirements for rotatable designs of 
any order we must now find which symmetric solutions will also satisfy the re- 
quirements for even-order moments. 


5.0. Second Order Requirements for Rotatability. For a design to be second 
order rotatable the even moments must have the following general form {i"] = 2, 
(i"j*] = As, fe] = 3A, where \, and A, are constants at choice and the odd mo- 
ments of order less than or equal to four must vanish. 

It may be noted here that the addition of center points to a design matrix D 
does not change the general form of the moments since their only effect is to in- 
crease the denominator N. 

5.1. Application of Moment Requirements. As noted previously in 4.0, the gen- 
eral second order moment [i] places no restrictions on the choices of radius mul- 
tipliers since 


& k 
i] = 2 (a,)°Cx(s) = Wy (a,)" Q S a =, 


a constant for all values of 7. 








850 G. E. P. BOX AND D. W. BEHNKEN 


From Tables la and 1b it can be seen that the generalized moment component 
is obtained by letting the coefficient b, vanish for odd moments, and assume the 
values 3 and 1 for the even partitions of @, viz., (4) and (2,2). Hence 

[ijk], = OCs) + Cals) ijk") 
so that 


[ejkol| = bed (a,)*C.(s) + famig"kU" 4, DY (a)*Ca(s). 


In the previous section we showed that, for second order rotatability, we must 
have , ro (a,)'Ca(s) = 0 making 


(ij"*k"1] = by D> (as)*Ca(8). 


aml 


This accomplished, all odd moments of order four would vanish with b, and 


(’;"] = 2 (a,)*Cu(s) = ra 


. 
(#'] = 32 (a.)'Ca(s) = 3d. 
Clearly any symmetric solution for the radius multipliers such that 
k 
> (a,)*Ca(s) = 0 
ame 


will provide a rotatable design of the simplex-sum type. 
5.2. Standard Solution for Radius Multipliers. We will now demonstrate that 
a solution holding for any k is obtained by letting 


a at 
me theuy) s=1,2,---k. 


This solution for the radius multipliers involving the binomial coefficients will be 
denoted by B;* and referred to as the standard solution. 

It is immediately evident that all odd order moments will be zero since the 
choice for the a, provides a symmetric solution as defined earlier. Further 


m=2(77) Cre -FE(02}) =» 


and, for each i, [i*] equals n/N times the sum of the square roots of the bi- 
nomial coefficients of order n — 2. 


sty — (n — 2\" (n — 28)(n — 38) — n(s — 1) (n — 2 a 
jk = 2d (” ~ ') (nm — 2)(n — 3) : (” ~_ ‘YEE kth 
<n —6n+6s +7... fijkd, 


"2" G-5a-5 8754" =, (0) = 0. 


(n — 2)(n — 3) 








SIMPLEX-SUM DESIGNS 851 


Since the zero quantity in brackets is the expression x aiCy(s), common to 
all fourth order moments, we have 


(ij) = fé7'] = 0, 


n—l vs a ae 2 2 —_ 
rA-=F(° ‘) a _ n(n - 


=i\s -—1 N- 6N 
4 = /(n— 8 (" ~ ay n(n — 1) 
= (223) Ore - Gye ee. 


5.3. Second Order Rotatability for the Case n = 3. For k = 2 (n = 3) the above 
demonstration does not apply since the fourth order moment formulas hold only 
for n 2 4 as noted previously. However by using the formulas in Table 1b, we 
can show that the above solution also applies here. 

2 -1 
t= 0(,2,) £Q)e-76- nish = sh - ash. 
Although apparently inconsistent with previous results this expression is zero 
because of a property of 3 x 3 matrices of the type [1 x, x:] with orthogonal 
columns of equal vector length. Since we have already shown that a matrix of all 
rows taken s at a time is the negative of the matrix of sums taken n — 8 at a 
time we have D, = —D, and [i j*], = {i j*], . From the general moment formula 
for n = 3 we have [i j*], = —5{i7*], and hence [i j*], = —5{ij*}, must vanish. 
The moment 


ej-> (, 4 ) th (2) —7(8— Ile fh + (8 -1) 


a=l 


c 
aff fh) - 5° Sh +5. 


However since {i’ 7*], = [i *}. and [° 7}, = —5{i’ 7’), + 9/N we have {i 7"), = 
3/2N, a constant for any matrix of this type. It then follows that {i j*] = 
Qi" 7], = 3/N = and similarly {i*] = 3,. Thus the moments are those of 
a rotatable design. 

We have thus demonstrated that for k 2 2 a second order rotatable design can 
always be derived from the first order simplex design. It is possible to show how- 
ever [13] that this method in its present form does not generate third order rota- 
table designs. 

5.4. Radius of Experimental Points. As is illustrated in figures la and 1b for 
the case k = 2 and k = 3 the simplex-sum designs consist of subsets of vectors of 
experimental points corresponding to the rows of the submatrices a,D, , a,D, , 

-, @&D,. Geometrically these subsets are symmetrically oriented one to 
another in that the vectors for a,D, bisect the edges of the simplex defined by 
aD, , the vectors of a,D, pass symmetrically through the faces of the simplex 
defined by a,D, and so on. We can readily obtain an expression for r, , the radius 
of the points in the sth subset. Denoting the uth row of D, by x1. ,8 = 2,3, ---,k 








852 G. E. P. BOX AND D. W. BEHNKEN 
we have xi, = > -i..x.,, where X.,,X.,, °° , Xu, is the uth set of s rows of the 
first order design matrix D, . Now since 
1 x, 
(DJ =| '™ 
1 x, | 
and [1 D,){1 D,)’ = nL, we have 
; {n —l=k, 
xX; Xj = | 
| ff, 

The square of the length of the row vector x., is therefore 


XiuXeu = (xi, + Xu +--+ + xu.) (Xu; + Xu teee + Xu,) 


= s(n — 1) + (3)(-» = s(n —8). 


Thus the radius of the experimental points in any submatrix a,D, is given by 
r, = a,{s(n — s)]', and since in a symmetric solution 4, = Gn_,, 1%: = Tas. 
For the particular set of radius multipliers of the standard solution r, = 


— 9\74 
E by 3 [s(n — s)|'. Asummary of the radii for k = 2 through 8 of the standard 


solution rotatable designs is given in Table 2. 


TABLE 2 
Radii of experimental points for standard solution rotatable designs 





rm re La} vw rs re ‘ rs 
1.41 
1.73 
2.00 
2.24 
2.45 
2.65 
2.83 | 


= | 
— j 


2.00 

2.00 

1.95 . 

1.89 § 2.3 2.65 

1.84 ‘ 2.30 2.83 


|) POP | 
| SSP =8S88 


5.5. Singularity and Near Singularity of Moment Matrices. A set of points can 
have the moments of a rotatable design but be impractical as a design since it 
leads to a singular moment matrix. The singularity arises from a dependency 
between the columns in the X matrix for the bp and quadratic terms, by , bx , 
--+ , by . The situation is easily remedied, however, by the addition of center 
points to the design matrix. The moment matrix is singular [5] when the stand- 
ardized fourth moment constant \{ achieves the value \, = 44/(A2)” = k/(k + 2), 
implying that the design points all lie on the same hypersphere [14]. For the 








SIMPLEX-SUM DESIGNS 853 


TABLE 3 
Comparison of di, to its singular value for standard solution designs 

k ; 
, eF+D ™ 
2 500 500 
3 .600 .601 
4 667 .670 
5 714 .724 
6 750 .769 
7 .77 811 
8 .800 850 


designs arising from the standard solution for a, we have 


n—1 n—-2 i“);-2 
vias 0 [C=] _ (n — 1)(2” — 2) 
. 7, + mues 208 en | —~— ai 2 
aCe 
o=ml s— 1 


where we have used N = 2" — 2, i.e., no center points having been added. The 
value for 4 is equal to the singular value k/(k + 2) when k = 2 and remains 
close to the singular value as k increases, as is shown in Table 3. 

Since the addition of center points has no effect on the moments except to 
change N we see that the addition of No center points will change \, by a factor 
of (2" — 2 + N»)/(2" — 2). In practice sufficient center points were added to 
provide a satisfactory profile for the variance function V(g,) taken along a 
radius vector. Denoting the distance from the center of the design by p = ( x’x)' 
it is suggested in general in [5] that sufficient points be added so that V(g,) at 
p = O is equal to that at p = (A,)’. Such an arrangement causes the variance to 
be approximately uniform over the important range p = 0 to p = (A2)*. These 
designs will be said to attain “uniform variance’. 





6.0. Additional Second Order Rotatable Simplex-Sum Designs. The standard 
solution for a, affords a set of rotatable designs for all k 2 2. When k 2 5 how- 
ever, the number of experiments required by the standard solution becomes 
excessive. Fortunately, for such values of k smaller reduced designs are possible. 

6.1. Solution Space of Radius Multipliers. We have shown in Section 4 that 
for second order rotatability we must find values for a,,s = 1, 2, --- , k, such 
that >> aiCu(s) = 0 and > atCa(s) = 0 where Cu(s) and Ca(s) are the 
coefficients of the moment components of D, (Tables la and 1b). When those 
values are found it was shown that the other moment requirements were auto- 
matically satisfied. 

To state these requirements in a more convenient form for our present prob- 











854 . G. E. P. BOX AND D. W. BEHNKEN 


lem let us define the vectors 
— (a, de “ee a, “ee a&%), 
= (aj a}-+-a,--- a), 


ee 4 4 
ot (a; Gz +++ Gy +++ Ay), 


Ch = (Cy(1) Cu(2) --- Cu(s) --- Cu(k)), 
Cu = (Ca(1) Ca(2) «++ Ca(s) --- Ca(k)). 


The requirements for second order rotatability are therefore Cia; = 0 and 
Cia, = 0. If we choose values of a, , such that a, = a,_, , we have shown previ- 
ously that C;,a; = 0. Therefore, calling any vectors a; and a, which are derived 
from symmetric solutions, symmetric vectors, we may further simplify our prob- 
lem to that of finding all symmetric vectors a, such that Cia, = 0. We must 
also add the restrictions of course, that all the elements of a, are greater than 
zero. 

The restriction of symmetry on the vector a, has the effect of confining its 
values to an m dimensional subspace for which m = k/2, if k is even and m = 
(k + 1)/2, if k is odd. This is evident since a, has exactly m elements which can 
be varied independently, the remaining k — m elements then being determined 
by the relationship a, = a,_,. The elements of Cy are symmetric in a correspond- 
ing way as was shown earlier. Hence for convenience we might consider a, and Cy, 
as two m-dimensional vectors and use the fact that m — 1 independent vectors 
can be found orthogonal to any vector in m-space. Thus if we find m — 1 inde- 
pendent solutions to the equation Ci,a, = 0 they will form a basis for the solu- 
tion space of all possible vectors satisfying this equation, that is of all vectors in 
the m — 1 space orthogonal to C, . Since the elements of Cy are of mixed sign 
it is clear that solution vectors can be found which fall in the positive 2‘-drant. 

6.2. Specific Solutions. We will now obtain the m — 1 basis vectors ¥; , Y2 , 
*** Tri fork = 3, 4, --- 8, selecting them to contain the maximum number of 
zero elements possible. Where zero’s can be introduced, the equivalent designs 
will involve fewer points than the standard solution since any submatrix with a 
zero radius multiplier, may be eliminated from D without altering the moments. 
All other designs, resulting from the orthogonality relationship, can be derived 
from these basis vectors by taking linear combinations 


a = AY, + V2 + +++ + dauYn, 


where the d,’s are any constants such that a, 2 0. 

It will be recalled from the discussion of the standard solution that the two 
factor design is an anomaly in that its rotatability does not result from the 
orthogonality relationship. For k = 2, Cia, ~ 0 and hence a specific solution 
does not follow in the usual way. When k = 3, m = 2 and hence only one solu- 
tion, the standard solution, is available, (Y, = a,). Similarly when k = 4, m = 








SIMPLEX-SUM DESIGNS 855 


2 so that fork < 4 the standard solution is unique. When k = 5 however then 
m = 3 and two independent solutions Y, and Y, are possible. Specifically 


Ca¥; = (1 -2 -6 -—2 1)¥, =0, 


and here for the first time we can obtain reduced designs. Two suitable basis 
vectors are 


Yi 
Y: 


(1 0 4 0 1), 
(1 3 0 4 1), 


I 


whence 
a’ =(1 0 37 O 1), 
a’ =(1 2% oO 2° 1). 


The arrangement employing Y, omits a,D, and a,D, while that employing Y, 
omits a,D, from the design. 
When k = 6, then m = 3 and the relationship 


CuYy,=(1 -1 -8 -8 -1 1) =9, 
is satisfied by 


ri=(1 1001 1), 


(1 0 ¢ 4 O 1). 


TY: 
When k = 7, then m = 4 and 
Cur =(1 0 -9 -16 -9 0 1)K%,=0 
is satisfied by 
ri=(1 0 4 0 4 O 1), 
Yr: =(1 0 0 § 0 O 1), 
Yr; =(0 1000 1 0). 
When k = 8, then m = 4 and 
Cur; =(1 1 -9 —25 —-2% -9 1 1)%,=0, 
which is satisfied by 
Yi 


04% 0 04% 90 1), 
Y:=(1 0 0 gd ¥ 0 O 1), 
Yr,=( 14 0 0 4 1 0). 

A fourth reduced design can be derived from the vector 


a=-:-T+ 7: = (0 1 O gs gy 0 1 0). 








G. E. P. BOX AND D. W. BEHNKEN 


TABLE 4 
Radius multipliers for some second order rotatable designs 
} No. of Experimental Points* 
Radius Multipliers 
Simplex-Sum_ | Composite 
igns Designs 
Radial Center | Radial | Center 


“ Points Points| Points Points” 


Std. 1 jl 


- 8409 


7598 

.7071| .6389| . 62 

8409 8406 | 42 | 10 
32 | 8 








.6687| .5623| 5623) .6687) | | 126 | 38 
5946) 5046 | 4 | 16 
oe 8 | | | 56 | 13 





.6389| .5081) .4729) .5081| 2! 59 
0 .5774)0 5774 128 | 21 
0 i) 5946/0 8 | 15 
1 0 0 0 iI 56 | 10 


6150} .4671| .4111) . 4671} .6150| 1 510 | 90 
0 .4472| .4472)0 0 : 270 | 26 

il 577410 | .5774| | 0 240 0 | 

0 | 5774/0 i( .5774)0 1 186 23 | 8% | 13 


' 








| 














* The ‘‘Composite Design’’ values refer to the composite second order rotatable designs 
derived in [5) and are included for comparative purposes. Half replicates of the cube por- 
tion are used for k = 5, 6 and 7 and one quarter replicate for k = 8. 

» Number of centerpoints required for “uniform variance” within p = (A.2)!. 


A summary of the radius multipliers used to obtain the standard solution 
designs (B;*) and the specific solution designs derived from the basis vectors, 
is given in Table 4. It can be seen that only the reduced designs will be practical 
in most instances when k > 4 since N increases rapidly. Also included in the 
table are the number of center points required to attain “‘uniform variance’’. 

In order to produce a design using Table 4, it is only necessary to select a 
suitable matrix D, and by taking all sums of rows s at a time, for each s of the 
non-zero a, values, generate the required D, matrices. Multiplication of D, by 
a, will then give the coordinates of the design points. An example is given in 
Section 9. 








SIMPLEX-SUM DESIGNS 857 


7.0. Replication. If it should be desired to replicate certain subsets of the 
derived matrices this can easily be done by making suitable adjustments to the 
radius multipliers. We will only consider the case where symmetric replication is 
used (i.e., D, and D,_, are replicated equally), thus ensuring that a symmetric 
solution for the radius multipliers can be found. 

If we replicate a particular pair of submatrices D, and D,_, », times, the ele- 
ments Cu(s), Ca(s), Cun(m — 8) and Ca(n — 8) will be multiplied by », and the 
moment equations will become 


= v4(ay)'Cu(s) = 0, 


A 
LX »(a,)"Ca(s) = 0. 


The first equation will still be negatively symmetric and will therefore be satisfied 
by any symmetric vector. The second equation will be satisfied if the new 
v.(a,)* equal the old (a,)*. Thus 


a,(D, replicated », times) = a,(unreplicated ) /(»,)*, 


and a similar relation holds for radii. 

For example, consider the standard solution for k = 3, and various patterns 
of replication. (We will always have »,a} = 1, a} = 4, ya; = 1.) Table 5 shows 
some results. 











TABLE 5 
The standard solution with k = 3 and various replication patterns 
Replications | Radius Multiplie Radii 
Pattern |__ : ener eas poasericesieastins cas inna 
| om v va eo | eo =| n rs n 
—_ _ _ —— ———__—— ee abe ire — —$ —_ 
a oe 1 l 1 zt) 1 | 173 | 1.68 | 1.78 
2 2 1 2 ee ee ee | 1.45 
3 1 . 1 | 1.00 | 1.73 


1 a. 1.73 





8.0. Blocking. When an experiment cannot be run under homogeneous 
conditions it is usually desirable to block the trials in such a way that the coeffi- 
cients can be estimated efficiently while the error is confined to the magnitude of 
variation within blocks. We will assume that under the experimental conditions 
peculiar to any block the relationship of the response to the factors remains 
unchanged with the exception of a shift in level. Following the development in 
[5] then we assume the expected value of the uth experimental observation is 
represented by the model 


k k k m 
i = Bo + 2 Blin + 2 Do Bitton + Do Be (2u oi be), 








G. E. P. BOX AND D. W. BEHNKEN 


Bo= DoF Bow, be = Bow — Bo, 
wel 4 

and 8. is the level parameter for the wth block, z,., is a dummy variable assuming 
the value unity when the uth experiment falls in block w and zero otherwise, n,, 
is the number of observations in the wth block (including center points) and 
N= Laws Nw - 

8.1. Orthogonal Blocking—Rotatable Designs. It is shown in [5] that orthogonal 
blocking is obtained when the within block moment components of the design 
(denoted by {t"'7"*].) have the following properties: 


l. tho 


[i one N Lin Liu = 0, 


€ d 1 & 
3. le = 5 Lette = FM, w=1,2--:m, 


where >-%” indicates summation over the n, design points within the wth block. 

The blocking arrangements we consider here will be called submatrix blocking 
schemes since they utilize the submatrices a,D, , a2D,, --- , a,D, , or combina- 
tions of them, as blocks. From the general formulas for the moment components 
of these submatrices it is clear that they individually satisfy the first two condi- 
tions above. To individually satisfy the third condition however it is necessary 
that the quantities a? be such that their ratios are rational numbers. Instead of 
using the submatrices themselves as the basis for blocking, combinations of 
these submatrices can be employed. If the a, are such that they allow blocks to 
be formed which yield a ratio of [7*]s./A2 which is equal to a rational number then 
orthogonal blocks can be obtained. Table 6 shows some blocking arrangements 
which are derived in this way for the designs in Table 4. 

In general the individual submatrices can not be employed as blocks without 
sacrificing either orthogonal blocking or rotatability. It is naturally most reason- 
able to sacrifice rotatability since clearly we only require an approximately 
“symmetric distribution” of information. Unfortunately when the conditions 
for rotatability are relaxed in this way the general inverse of the resulting 
matrix is not easily written down. When an electronic computer is used in the 
analysis of data however this presents little difficulty. The radius multipliers 
required for orthogonal blocking differ little from those required for rotatability 
and the resulting designs are thus nearly rotatable. Table 7 provides these 
values of a, together with the “uniform variance” number of center points for 
each sub-matrix block. 

8.2. Non-orthogonal Blocking of the Rotatable Designs. An alternative would be 
to retain rotatability but to accept slightly non-orthogonal blocking. From the 





Tota! No. of Points in Block 

Added* (se) 
| 5 ss 

pas 
Se ee 

| 22 

| 22 

| 26 

26 

82 

82 


| 2 | 





Points 


ware earn seamaii eons 





—— 
| 
| 


oD. 
5 
Lal 


SIMPLEX-8SUM DESIGNS 


aD; 
10 


: 
i | 
3B 
§ 
; 
3 
i 
i 
: 


| aD: | asDs | 
10 


| 
} 


1 
2 


1 


Summary of orthogonal blocking schemes for rotatable designs of Table 4 


| 
| Design | Block | 
| 

5 | Ts 


o 








G. E. P. BOX AND D. W. BEHNKEN 


TABLE 6—Continued 





Number of points in Block from Submatrix Total No. of Points in Block 
a Design | Block wh taxgt eats 
} | aD; aD: | 


| Sans Center Grand 
asDs | aiDs | asDs | asDe | a:D: | osDs | Center | Points Total 
| | Points Added® (the) 








9% | 14 107 
93 14 107 





9 | 9)(10) 18 19 
126 | (O) (7) | 126 133 
126 | ©) (7) | 126 133 

9 | (@)(10) | 18 19 


| 
| 


36 | (48) S4 
84 | (0) 84 
4 (0) 4 
36 (48) 





(4) 
84 (7) 
& | (7) 
9 | (4) 13 





| 

| 

j | 
9 | 

| 


* Those values not in brackets are the number of centerpoints required for ‘‘uniform 
variance” and can be replaced by any other number evenly distributed between blocks. 
The values in brackets also provide uniform variance but can not be changed freely with- 
out loss of orthogonality. 


point of view of computational difficulty this approach turns out to be much the 
simpler, while the loss of information due to the slight non-orthogonality in 
blocking is small. In reference [15] the moment conditions are given which the 
points within the individual blocks must satisfy in order to retain rotatability. 
In particular it is shown that these conditions are met by any blocks which 
satisfy conditions 1 and 2 in Section 8.1 and hence by the submatrices a,D, , 
a2D,, --+ , a@,D, whether or not they are augmented with center points. Thus 
when only condition 3 is violated in blocking a rotatable design the variance- 
covariance matrix of the response surface coefficients (adjusted for the block 
effects) retains the form necessary to give “spherical” variance contours. The 
form also readily lends itself to providing a general explicit solution for the 
normal equations. The estimates of the regression coefficients for any such ar- 
rangement are given below where we let 7, denote the average of the observa- 
tions in block w and use the notation 


liy} = Li ve, {iy} = L, Tin Ye» ijy} = Le Tin Zinta 


Az’ = 2% | « +2) — kN > he/ me 





SIMPLEX-SUM DESIGNS 


nN” | tov — 2Aadarz (= {iiy} — kN > (ete) |, 


~ x's. tinas! + (wv SM - a) 5 tant - a Eee], 


(A2N)“fiy}, 
by = (AN) “fijy}. 


The variances and covariances are 


™m 2)\2 
V(bo) = 20°%N Aw G +2)u —k (v > [low _ a), 


w Ne 


V(b;) = 0° (Nd), V(b) = 0 (NX), 


m (22 
V (bis) _ oN Aa | « + 1)A4 — (k —_ 1)N 2; Cie), 


mn 2 
Cov (bobi;) = —20°AA2N Aa, Cov (by b;;) = eN*A,|N ye [Tne = si]. 
It will be noted that the variances of b; and b,; are not affected by non-orthog- 
onal blocking but the variance of the constant term by and the quadratic terms 
b;; are affected. In [15] it is shown that the loss of information introduced by the 
small degree of non-orthogonality is small. 
The variance function from which the variance of an estimated value § can 


TABLE 7 
Radius multipliers and center points for orthogonal nearly rotatable 





Standard| 
Standard) 
Standard) 
4 

YY; 
Standard 
Ye 
7 Standard 
Seas 
8 Standard 


“So. = 


G§ 
Seon *& & 


_— 





eooec oer} 





‘|e. 


9.0. A Convenient Reduced Design for k = 7. The design derived from the 
basis vector, Y; for the seven factor design in Section 6.2, has several interesting 


features which will be discussed here. Since it requires but 56 points (plus center 


2 
bw 
Nw 


[v'] 


+a))/ 


- »i) 
(kK+1)a— (k - DN > 


w 
RN 5 lhe 





+L 
TABLE 8 


Seven factor second order rotatable design in three levels 


G. E. P. BOX AND D. W. BEHNKEN 


+ 2d4Az" [ + 2)m — ( 
nomial, it is extremely efficient. The comparable central composite design [5] 


points) to estimate the 36 coefficients of a seven factor second degree poly- 


‘ieee 
7 
A 
ate 
nN 
| 
a- 
2 
-—-_-_ 
~~) 
2 
a 
a 
—” 
8 
SX 
“> 
ll 
oc, 
=> 
— 
~ 
= 


readily be calculated is 





SIMPLEX-SUM DESIGNS 863 


requires 78 points (plus center points). The vector of radius multipliers that 
defines this design is a’ = (0 1 0 O O 1 O) and thus utilizes the points 
specified by the matrices D, and D, only. 

In seven dimensions it is possible to find a matrix D, , giving the coordinates 
of a regular simplex, which involves only the two levels —1 and +1, for each 
factor. Consequently D, and D, need only involve three factor levels. Further- 
more D, and D, provide orthogonal blocks. 

The 8 X 8 matrix [1 D,} which can be used to generate this design is 


1 1 1 l 1 1 1 1 
1 1 1 —| 1 -1 —1 -1 
1 1 a i -} 1 -] —1 
1 1 -1 -1 -1 -1 1 l 
1 =| 1 1 =~ =| 1 —| 
1 —1 1 —1} =|] 1 —1 1 
1 —1 |] 1 1 —1 -1 1 
1 =| —} =} 1 1 1 -1 


Its squared vector length is eight, as required, and all rows and columns are 
orthogonal. 

The derived matrices 4D, and 4D, are shown in Table 8. Since multiplication 
by a constant is permissible, we will define our derived design matrix D therefore 


as 
mh 
4 Ds 


The singularity of the moment matrix of this design is readily detectable by 
noting that all the points lie on a hypersphere of radius (3)' and hence center 
points must be added to make all coefficients separately estimable. The addition 
of ten such points will produce a design having the “uniform variance” property. 

For this design (and whenever nonorthogonal blocking does not complicate 
the normal equations) the regression coefficients and their variances are easily 
obtained from the general solutions for rotatable designs given in [5]. 


REFERENCES 


\1] Box, G. E. P., Integration of techniques in process development. Statistical Tech” 
niques Research Group, Technical Report No. 2, Princeton University, Prince- 
ton, N. J., 1957. 

{2} Box, G. E. P. anp Draper, N. R., A basis for the selection of a response surface de- 
sign, Statistical Techniques Group, Technical Report No. 23, Princeton Uni- 
versity, Princeton, N. J., 1958. 

[3] Box, G. E. P., Use of statistical methods in the elucidation of basic mechanisms, Paper 
presented at International Institute of Statistics, Stockholm, 1957. 

[4] Box, G. E. P. anp Lucas, H. L., “Design of experiments in non-linear situations,’’ 
Biometrika, Vol. 46 (1959), pp. 77-90. 

(5) Box, G. E. P. ann Hunter, J. 8., “Multifactor experimental designs for exploring 
response surfaces,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 195-241. 





864 G. E. P. BOX AND D. W. BEHNKEN 


(6) Box, G. E. P., ‘“Multifactor designs of first order,’’ Biometrika, Vol. 39 (1958), pp. 49-57 
[7] Box, G. E. P. anp Witson, K. B., ‘‘On the experimental attainment of optimum condi- 
tions,’’ J. Roy. Stat. Soc., Ser. B, Vol. 13 (1951), pp. 1-45. 
{8} Tuxey, Joun W., ‘‘Keeping moment-like sampling computations simple,’”’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 37-54. 
{9} Wisnart, J., “Moment coefficients of the k-statistics in samples from a finite popula- 
tion,’’ Biometrika, Vol. 39 (1952), pp. 1-13. 
{10} Hooxe, Rosert, ‘Symmetric functions of a two-way array,’’ Ann. Math. Stat., Vol. 
27, (1956), pp. 55-79. 
[11] Rosson, D. 8., ‘“‘Applications of multivariate polykays to the theory of unbiased 
ratio-type estimation,’’ J. Amer. Stat. Assn., Vol. 52 (1957), pp. 511-522. 
{12} Beunxen, D.W., “Sampling moments of means from finite multivariate populations,”’ 
submitted to Ann. Math. Stat. 
{13} Box, G. E. P. anp Bennxen, D. W., “Derivation of second order rotatable designs 
from those of first order,’’ Statistical Techniques Research Group, Technical 
Report No. 17, Princeton University, Princeton, New Jersey, 1958. 
{14] Bosg, R. C. anp Draper, N. R., ‘‘Rotatable designs of second and third order in three 
or more dimensions,’’ Inst. of Stat. Mimeo Series No. 197, University of North 
Carolina, Chapel Hill, 1958. 
[15] Box, G. E. P. ann Beunxen, D. W., Simplex-sum designs, a class of second order 
rotatable designs derivable from those of first order, Inst. of Stat. Mimeo Series 
No. 232, North Carolina State College, Raleigh, North Carolina, 1959. 





THIRD ORDER ROTATABLE DESIGNS IN THREE DIMENSIONS' 


By Norman R. Draper 
Mathematics Research Center, United States Army, Madison, Wisconsin 


0. Summary. Two recent papers by Bose and Draper [1] and Draper [3] showed 
how it was possible, by combining certain sets of points, to construct infinite 
classes of second order rotatable designs in three and more dimensions. In this 
paper, third order rotatable designs in three dimensions are discussed. First, a 
general theorem is proved that provides the conditions under which a third 
order rotatable arrangement of points in k dimensions is non-singular. The four 
previously known third order designs in three dimensions are stated; it is then 
shown how some of the second order design classes constructed earlier [1] may 
be combined in pairs to give infinite classes of sequential third order rotatable 
designs in three dimensions. One example of such a combination is worked out 
in full and it is shown that two of the four known designs are special cases of 
this class. A summary of further third order rotatable design classes that have 
been shown to exist, and that have been tabulated by the author, concludes the 
paper. 


1. Introduction. The technique of fitting a response surface is one widely used 
(especially in the chemical industry) to aid in the statistical analysis of experi- 
mental work in which the “yield” of a product depends, in some unknown 


fashion, on one or more controllable variables. Before the details of such an 
analysis can be carried out, experiments must be performed at predetermined 
levels of the controllable factors, i.e., an experimental design must be selected 
prior to experimentation. Box and Hunter [2] suggested designs of a certain type, 
which they called rotatable, as being suitable for such experimentation. Such 
designs permit a response surface to be fitted easily and provide spherical in- 
formation contours. A second order rotatable design aids the fitting of a second 
order (i.e., a quadratic) surface, and a third order rotatable design aids the fit- 
ting of a third order (i.e., a cubic) surface. 

Let us assume that the measurements of the k factors have been coded, per- 
mitting the use of cartesian axes in k-dimensional space to describe an experi- 
mental design for k factors. Suppose that, in an experimental investigation with 
k factors, N (not necessarily distinct) combinations of levels are employed. Thus 


Received December 9, 1959 

1 The early stages of this work, at the University of North Carolina, were supported in 
part by the United States Air Force through the Air Force Office of Scientific Research of 
the Air Research and Development Command under contract No. AF 18(600)-83 and in 
part by a Bell Telephone Graduate Fellowship Award to the author, who acknowledges 
gratefully his indebtedness. Sponsorship was also provided by the United States Army 
under Contract No. DA-11-022-ORD-2059 at the Mathematics Research Center. Reproduc- 
tion in whole or part is permitted for any purpose of the United States Government. 


865 





866 NORMAN R. DRAPER 


the group of N experiments which arises can be described by the N points in k 
dimensions 


(1.1) (Zs, Zou, °** » Zew)s u=1,2,---,N, 


where, in the uth experiment, factor ¢ is at level z,,. This set of points is said 
to form a rotatable arrangement of the third order in k factors if 


> tis ™ «> = > zw = AN, (say), 


(1.2) Dd tin = 30 zivzju = 34N, (say) 


> zi. = 5>- Diuliu = 56> LivLjulie = 15\4N, (say), 


where i, j,1 = 1,2,-:- ,k,i4%j #14 i,u = 1,2, --- , N, and all other simi- 
lar sums of powers and products up to and including order six are zero. Condi- 
tions (1.2) and the condition variance §(x) = f(x’x) are equivalent. The N 
points of the arrangement are said to form a rotatable design of third order if 
they give rise to a non-singular X’X matrix. By convention, the scale of the 
design is normally adjusted so that \, = 1. (This adjustment is a convenience 
{1] and not an essential. In this paper designs are presented in terms of a param- 
eter and scaling, which merely fixes this parameter, has not been performed.) 
The conditions for non-singularity of the X’X matrix for the third order arrange- 
ment are 


(1.3) ha/da > k/(k + 2), 


(this, alone, is the condition that the matrix for a second order design should be 
non-singular) and 


(1.4) hedo/AG > (k + 2)/(k + 4). 


These conditions are derived by Gardiner, Grandage and Hader [4]. Note that 
the left members of (1.3) and (1.4) are independent of the scale of the design. 

For a third order design, the determinant of X’X is proportional to 
[(k + 2)dXy — KAR[(K + 4)d\ede — (k + 2)A5]. Thus, if either of these factors is 
zero, X’X is singular and some of the coefficients in the third order polynomial to 
be fitted by least squares to the experimental results are not estimable. (These 
coefficients are given by b = (X’X)~’X’Y, in the usual notation, when X’X is 
non-singular.) If either factor is very near to zero, some of the variances of the 
estimates are large and the design is said to be almost singuiar. It is impossible 
for either factor to be less than zero, i.e., for either of the inequalities (1.3) and 
(1.4) to be reversed, as will be shown. 

Since the left member of (1.3) is of order N, this first inequality may always 
be satisfied merely by an increase in N which leaves the original points unaltered 
and which adds nothing to the sums of powers and products, namely by the 
addition of center points. However, the left member of (1.4) is of order zero in 
N and depends only on the points (2, , --* , Zu), @ = 1,2, --- , N. Thus, if a 





3RD ORDER ROTATABLE DESIGNS IN 3D 867 


third order arrangement is singular (i.e., gives rise to a singular matrix) because 
equality is attained in (1.4), the situation cannot be altered by the addition of 
center points. The question now arises: under what conditions is a third order 
arrangement singular? An exact answer to the question may be given as follows. 

Tueorem: A third order rotatable design in k dimensions is singular if and only 
if all of its points, excluding center points, lie on a k dimensional sphere centered at 
the origin. 

Proor: Let 


N 
’ ) 
SS - Zz Zin ’ 


uel 


Then, if the points (1.1) satisfy the conditions (1.2), 


N N 
¥(%) 4 2 2 
Si? = Dat. = 320 zits, 
ual 


u=l 


N N N 
¥(4) 6 4 2 - 2 2 2 
Ss - >, 2s - 5 >, rete — 15 >> ziezieze, 
uel 


ual 
i,j, l = 1,2,---,k, i #7 #1 # i, and Sj” may be denoted by S; since it 
is independent of 7 if conditions (1.2) hold. It follows that 


& 


N k N 
kS. = >~ S{° = dVew= Dd (th + te t+ 7% + zi.) 
wal 


t=1 unl tol 


= | (at +--+ +2.) -—3) thze—- Dd tazh2)s| 
1m 


u 1H#j;A41 


N N 
re — 3k(k — 1) >, tint — K(k — 1)(k — 2) > zieziz. 
ua! uel 


r\, — 3k(k — 1)S6/5 — k(k — 1)(k — 2)8,/15 


2 
where r, = z 
obtain 


2 
1 


. + th +--+: + zi. Solving for S, and referring to (1.2), we 


N 
15\gN = Se = 15 >> rh /k(k + 2)(k + 4). 


uel 
Similarly 
N XN 
3uN = So = 3D ri/k(k +2), MN = S = Dri/k. 
uel 


Hence 





868 NORMAN R. DRAPER 


(This expression is equal to k/(k + 2) if and only if all points lie on a sphere 
and it can be increased merely by an increase in N, i.e., by an addition of center 


points.) Also 
* ‘ ~ 2\ 
Meds _ (2 AHEe) b+2_ pk +2 
3 [ey kta E+ 
Va" 
\u=l / 
Now, by the Cauchy-Schwartz inequality [5], the factor F is greater than unity 
unless all the non-zero r, are equal, when unity is attained and the right hand 
side reduces to (k + 2)/(k + 4). Thus the arrangement is singular if and only 
if all its points (excluding center points) lie on a sphere in k dimensions. 
Hence in order to get usable third order designs, we must combine at least 
two spherical sets of points with different positive radii. 


2. The known third order rotatable designs in three dimensions. We may 
divide third order designs into two groups, sequential and non-sequential. A 
sequential design can be performed in two parts. One part is a second order ro- 
tatable design which may be run first; then, if the second order polynomial 
approximation is found to be inadequate, the trials of the second part may be 
run and a third order surface fitted. Such designs are more useful in practice 
than the non-sequential type, of which all the trials must be run at one time in 
order to make a rotatable least squares fitting possible. 

Only four third order designs are known in three dimensions [4]; these con- 
tain points which are the vertices of 


(a) icosahedron plus dodecahedron (32 points), 
(b) cube plus two octahedra plus cuboctahedron (32 points), 
(2.1) (c) (cube plus octahedron) plus (truncated cube plus octahedron) 
o (44 points), 
(d) (eube plus doubled octahedron) plus (truncated cube plus octa- 
hedron) (50 points). 


Bose and Draper introduced [1] a point set notation in which the 24 points 
(+p, +q, +r), (+9, +r, +p), (+r, +p, +q) were denoted by G(p, gq, r). 
The eight points (--a, +a, +a) were then denoted by 3G(a, a, a), since G(a, a, a) 
consists of the eight points (--a, +-a, +a) three times over. Other sets of points, 
derivable from G(p, g, r) by setting some of p, g and r zero or equal to one 
another, were similarly described. If we translate the designs (2.1) into this 
notation, they become 


(a) 3G(~,m,0) + 3G(p2, g2,0) + 4G(a, a, a), 

(b) 3G(a, a, a) + 3G(c,, 0, 0) + 4G(e2, 0, 0) + 4G(f, f, 0), 
(c) [3G(a, a, a) + 3G(c,,.0, 0)] + [G(p, 9g, g) + 2G(e, 0, 0)), 
(d) [3G(a, a, a) + $G(c,, 0, 0)] + [G(p, g, g) + iG(a, 0, 0)), 


(2.2) 





SRD ORDER ROTATABLE DESIGNS IN 3D 869 


where in each case the values of the parameters are determined by the rota- 
tability conditions, and will not be quoted here. 

Design (a) is such that the radii of the two spheres on which all its points lie 
are very nearly equal. In fact, \se/A, = (1.00028)5/7, which means that the 
design is almost singular. Thus the variances of the linear and cubic coefficients 
are very large. This design is of the sequential type. Design (b) is a combination 
of our basic generated sets and is non-sequential. Designs (c) and (d) are both 
sequential; (c) is almost singular, as was noted in the original presentation. 


3. The construction of infinite classes of third order rotatable designs in three 
dimensions. We shall now show how to obtain infinite classes of third order 
designs of the sequential type by making use of the previously derived [1] second 
order classes. In order to do this, we shall find it necessary to construct addi- 
tional functions similar to the excess function previously introduced in [1]. 

We recall that 


(3.1) Ex|G(p, q,r)) = 8(p' + gq +r — 3p'¢ — 3¢'r — 37’p’). 
Additionally we define 

Az[G(p, q, 7)] = 8p +4 +7), 

Gz|G(p, q, r)| = 8(p'¢ + a'r? + rp’ — pg’ — dr’ — 1p’), 

Hz|G(p, q, r)| = 8(p° + g° + 1° — 45p'q'r’), 

Tz|G(p, q, 7)] 

= 4(p'f + gf? + rp’ + pg + gr + rp’ — 18p'q'r’). 

Note that, for the point set G(p, q, r), 

Lt. = Azx(G), L the = 8(p' + q° +7'), 


Dd rivtie = 3(p'g + qr’ + rp’), 


(3.3) Lizte zie = 8(p'¢ + a'r? + r'p’), 


LD tiv thu = 8(p'g' + gr + rp’), 


« t>/ 
> coe - 8(p* + q + r*), Ly HvZjulis - 4p’, 
where i # j # | + i,i,j,l1 = 1,2,3 and u = 1,2, --- ,N; the notationi > j 


here denotes that i is before j in cyclic order, i.e., 1 > 2,2 > 3,3 > 1. Con- 
sideration of (3.2) and (3.3) together with (1.2) shows that if 


(3.4) Ex(G) = G2(G) = H2(G) = Iz(G) = 0, 


then the points of G(p, q, r) form a rotatable arrangement of the third order. 
All of the excess functions we have defined operate linearly on sets of points of 





870 NORMAN R. DRAPER 


the form G(p, q, r) and fractions of G(p, q, r), that is to say Qz( >>; S;) = 
> Qz(S,), where Q represents any of E, G, H or J. Thus if the four functions 
of (3.4) are zero for any aggregate of points, then this aggregate forms a third 
order rotatable arrangement. The arrangement will be a design provided that 
the non-singularity conditions are satisfied. 

Listed in Table I are the generated sets of the form G(p, g, r) or fractions 
thereof which were used previously [1], together with the values of the excess 
functions for each set. 

It is, of course, possible to form non-sequential third order designs and classes 
of designs by a skillful combination of these sets. We have already said that 
design (2.1)(b) is of this type. We shall leave aside this possibility and instead 
form some infinite classes of designs that may be performed sequentially. Since, 
for sequential performance, each of the two parts of the design must be itself 
a second order design, we shall employ some of the infinite classes of second 
order designs already obtained. Table II contains a number of unscaled second 
order design classes which may be used, and the values of the various excess 
functions for each class are shown. Since each class satisfies the second order 
conditions, Ez(class) = 0, as is indicated in the table. The classes we shall 
consider, which are obtainable from the basic generated point sets, have 
Gz(class) = 0. Each class contains three parameters which give rise to two 
ratios connected by one equation (Ex = 0). If we combine two such classes and 
apply the other conditions of (3.4), we shall have a set of points with six pa- 
rameters giving five ratios connected by four equations. Thus we shall obtain 


g 70(p,4,0) Poe 


1 1 1 
(f,f,0) (s,0,0) (¢,0,0) 
¢ 30,0) e i” ¥ 


No. © 
ie isicake Me) Fee he 
ee ae te tt 


u(p oq” -3p"a")] B(p ea -3p°a")| 8(p*-a"-6p"a") 


s 2% 
B(p'a®eq’rer yp? be eb 
2s 2h Mp a@-pa) 
pea’ ger -r®p") 
a 6,6 6,6 6,008 -unr2ee 
eee k(p +a) Spee) [8p +2q -"5p°q ) 
“Spar ) 
wi 


be 
B(p'q?eq'r orp” -op"a?r*) ; : am 
or bp atep@a") |8(p'a%+a°-Sr°a ) 


apa “oq? ror pt ope tr®y ra a 


* For these two sets « unique 
expression for Ix does not exist since 
there is a lack of symmetry. The two 
possible values of the expression Ix 
are shown; they are equel when p = q. 





3RD ORDER ROTATABLE DESIGNS IN 3D 


O(p,4,9) @(p-4,4) 


+ H(s,0,) + $c, 0,0) 


(a  Seleaso te*+2( Fee y| ecfotatoae® | 8(p%+2q" wae” 


. 
+2c 


ahora a a 
-lfa rn a") (np areq' TR 
-16a° 


a single infinity of third order rotatable arrangements dependent on one pa- 
rameter ratio. 

We shall now illustrate by an example the formation of infinite classes of third 
order rotatable arrangements by the combination, in pairs, of certain of D, , 
D,, +>: , Deg and the application of conditions (3.4). 

Consider the combination D, + Dz, containing 50 points. These points form 
a sequential rotatable arrangement in three dimensions if all the excess func- 
tions are zero, namely if 


(3.5) Ex(D,) = Ex( Ds) = Gz(D, + De) = Hx(D, + Dg) = Ix(D, + De) = 0. 
In full, these equations are 
ci + ch — 8a = 0, 
4(p' — q' — 6p'q’) +c = 0, 
ci + ec) — 56a° + 4(p* + 2° — 45p’q') + c* = 0, 
— 2a° + p'f + q° — 8p’ = 0. 


(3.6) 


Make the substitutions 


‘Qo 2 2 2 2 2 2 2 4 6 
(3.7) cy = za, C2 = ya, p=, q =, c = ia’. 


Since equations (3.6) are homogeneous, they may be put in the form 
2 2 
t+y 8, 
uo — 6 — +34 = 0, 
x + y' — 56 + (4u’ + 80° — 18w* + 1)t = 0, 
— 2+ (wv + 0° — 8w’*)t = 0 


(3.8) 


’ 





872 NORMAN R. DRAPER 


a system of four equations in five unknowns; thus if one variable is specified, 
the values of the other variables are determined. However, we are interested only 
in solutions for which z, y, u, v and ¢ are all real and positive. Only in such a 
case will a rotatable arrangement exist. Simple algebraic solution of the equa- 
tions (3.8) is not possible. We proceed by selecting one variable and obtaining 
the others successively, applying the conditions for positive solutions as we go. 
Select v = 0. Then from the second equation, u = 3v + }(400* — 1)*. From 
the fourth equation, 2¢° = Fv"(400" — 1)! — 4v° — v/4. Now t = 0. Thus 
the top root alternative is impossible, which means that 


(3.9) u = 3v — $(4007° — 1)! 
and 
(3.10) 2t7* = v°(400? — 1)' — 40° — v/4. 


Now u 2 O implies that 0.025 < v* < 0.25 and t = 0 implies that 0.143187 < 2”. 
Thus we shall require 


(3.11) 0.143187 < v* < 0.25 


in order that all of ¢, u, and v shall be real and positive. By substituting for u 
and ¢ in the third equation we find that z* + y’ = f(v), where 


v(384v' — 480? — 1) 


But since 2 + 77 = 8, real, non-negative solutions exist for z and y only when 


f(v) = 24 — 4[2(160* — 7» + 1) + (1 + 80 — 240*)(400" — 1)') 


16 Ss f(v) Ss 16(2)! = 22.627424. The range of v for this to be true is more 
difficult to find and involves considerable computation. We find, considering 
only points in the range (3.11), that f(0.419894) = 16, f(0.466316) = 16(2)', 
and f(v) increases monotonically from its lower value (16) to its upper value 
[16(2)'] for v in the indicated range. This may be observed from the summary 
table of solutions to be presented later. Thus we see that whenever 


(3.12) 0.419894 < v S 0.466316, ie., 0.176311 < uv < 0.217451, 


then equations (3.8) have a solution that gives rise to a third order rotatable 
arrangement. We have already obtained both ¢ and u in terms of v. It remains 
only to express z and y in terms of v. We recall that 


(3.13) a+y=8 2+y =f(v). 
Set 
(3.14) zt+y = 26, zy = 9; 


then, substituting in (3.13), we find 40(6 — @) = f(v), a cubic which, given », 
may be solved for 6 = 6(v), either iteratively or by the trigonometric method 
for solution of cubics. From (3.13) and (3.14), z, y = 6 + (4 — &)*, which 
are functions of v only. These calculations were carried out for 12 values of v 





3RD ORDER ROTATABLE DESIGNS IN 3D 


TABLE Ill 
A Third Order Rotatable Design Class 


r “ ‘ z y | Na“ AsNan* Ma/AEN eae /AE 


0.419804 | 0.020506 61.248478 | 2 3 33.005020 | 15.670608 | 0.012542 | 0.737934 
0.420 0.029553 | 61.069211 | 2.073576 ‘ ; 32. 963451 15.640735 «0.012543 | 0.737912 
0.425 0.027503 | 53.553302 | 2.438052 . ; 31.187123 | 14.384046 | 0.012604 0.735680 
0.430 0.025484 | 47.517331 | 2.563086 ° | 20.705827 | 13.373640 | 0.012671 | 0.733706 
0.435 0.023497 | 42.568200 2.640568 : 47. 249791 28.440962 | 12.542421 | 0.012743 | 0.732211 
0.440 0.021539 | 38.440873 | 2.695576 . 46.195202 -27.368009 | «11.847274 | 0.012825 | 0.730679 
0.445 0.019610 | 34.949405 | 2.735256 . 45.242534 | 26.428609 | 11.267215 | 0.012012 | 0.720171 
0.450 0.017709 | 31.960134 | 2.765977 | 0. | 44.350086 | 25.599301 | 10.750681 | 0.013010 | 0.727717 
0.455 0.015834  20.374247 | 2.790168 43.531524 | 24.864635 | 10.310055 | 0.013121 0.726003 
0. 460 0.013084  27.117168 | 2.800441 ° | 42.726260 | 24.208077 9.925752 | 0.013261 | 0.723665 
0.465 6.012159 25.131560 | 2.824930 ° 41.967003 | 23.617604 | 0.585751 | 0.013474 | 0.719404 
0.466316 0.011682  24.648331 2.828428 | 41.462397 | 23.471355 | 9.502710 | 0.013653 | 0.715107 


in the range (3.12), including the end points of the range, and the results are 
shown in the first five columns of Table III. Any line of the table gives five 
ratios which may be employed in (3.7) to give five of the parameters ¢ , ¢ , 
a, p, q and c in terms of the sixth. (After the addition of any center points to 
be used, the sixth parameter can be fixed by applying the scaling condition 
he = 1.) Thus we obtain a rotatable arrangement which is a design if the non- 
singularity conditions are satisfied. Since the first of these can be satisfied by 
the addition of center points, it need not be considered further. We require, then, 
that Ap\2/A, > 5/7 = 0.714286. By our theorem, this will be so unless all the 
points lie on one sphere. Now each design consists of five separate point sets of 
squared radii 3a’, ci, ¢2, p + 2¢ and ¢ or 3a’, za’, ya’, (u + 2v)c and ce’. 
It is easy to see from the table that the various radii are different. The actual! 
values of the parameters are given by: 


AN = a'(8 + 22 + 2y) + (8(u + 2) + 2)e’, 
AN = 8a‘ + (16w + 80’ )c*, 
AV = Sa° + 24w’c’*. 


Since c*° = ta’, these values may be fo::ad in terms of a, as shown in Table III. 
We now examine further the extreme cases of the table. The bottom line gives 
a design consisting of 


[3G(a, a, a) + 4G (ce, ’ 0, 0) + 4G(0, 0, 0)] + (G(p, qd; q) + 1G(c, 0, 0)) 


with values of the parameter ratios as derived above. Reference to (2.2) will 
show that this is known design (c) with six center points (represented by 
1G(0, 0, 0)). The top line gives a design consisting of 


[3G(a,a,a) + 3G(c,, 0,0) + 34(c, , 0, 0)) + [G(p, g, g) + 4G(c, 0, 0)) 


with the values of the parameter ratios as derived above. Reference to (2.2) 
will show that this is known design (d). The details of the verifications will not 





874 NORMAN R. DRAPER 


be reproduced here. Thus the infinite class of third order rotatable designs ob- 
tained has, as its two extreme cases, two of the designs already known, and the 
passage from one extreme to the other is by a continuous infinite sequence of 
new third order designs for which the second non-singularity condition becomes 
successively stronger. 

The class of designs just obtained was chosen for a detailed presentation be- 
cause of its link with the only sequential type designs known previously (ig- 
noring the claims of design (2.2) (a) which is almost singular). 


4. Further classes of third order rotatable designs in three dimensions. 
Several other combinations of D, , D,, --- , Des also give rise to infinite classes 
of third order designs. Of the 15 possible pairs, D, + D,, D,; + De, D. + Ds, 
D, + Dy, D2 + De, Ds + Ds, Ds + Ds and D, + D, all provide third order de- 
signs and these have been tabulated in the same way as the example of the pre- 
vious section. The combinations D, + D, , D,; + D,;, D; + Ds and D, + D; do 
not give third order designs. The remaining three combinations, D, + D;, Ds 
+ D,and D; + D, have not yet been investigated. 

The intention of this paper is to show how the design classes can be constructed 
and to indicate which classes are known to exist. It is hoped to present, in a 
future report, some specific single designs (selected from the infinite classes 


mentioned above) in a form in which they can be used conveniently by experi- 
menters. 


5. Acknowledgment. I am grateful to Dr. R. C. Bose for his guidance and 
encouragement during the preparation of this paper. 


REFERENCES 


[1] R. C. Bost anp Norman R. Draper, “Second order rotatable designs in three dimen- 
sions,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 1097-1112. 

[2) G. E. P. Box anno J. 8. Hunter, ‘“‘Multi-factor experimental designs,’”’ Ann. Math. Stat. 
Vol. 28 (1957), pp. 195-241. 

[3] Norman R. Draper, ‘Second order rotatable designs in four or more dimensions,’’ 
Ann. Math. Stat., Vol. 31 (1960), pp. 23-33. 

[4] D. A. Garpiner, A. H. E. Granpace anp R. J. Haver, ‘Third order rotatable de- 
signs for exploring response surfaces,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 
1082-1096. 


(5) G. H. Harpy, J. E. Lirrtewoop anp G. Pétya, Inequalities, Cambridge University 
Press, 1952. 








A THIRD ORDER ROTATABLE DESIGN IN FOUR DIMENSIONS' 


By Norman R. Draper 
Mathematics Research Center, United States Army, Madison, Wisconsin 


1. Introduction. Only one third order rotatable design in four dimensions is 
known, namely the 128 point design presented by Gardiner, Grandage and 
Hader [2]. This is formed by combining the 96 points 


(ze, te, +d, +d), (+c, +d, +ce,+d), (+c, td, +d, +0), 
(ad, +c, tc, +d), (+d, +c, +d,+c), (+d, +d, +c, +0), 
of the “truncated cube (2)” with the 16 points arising from two cross-polytopes 
(+c;,0,0,0), (0, +e;,0,0), (0,0, +e;,0), (0,0,0, +e,), 
i = 1, 2, and the 16 points of a measure polytope 
(+a, ta, +a, +a), 


where a, c, , c and d take appropriate values obtained after considerable compu- 
tation. 

We now give a very simple design which requires 96 points (32 less) and is a 
combination of four second order rotatable arrangements, each containing 24 
points. This permits the use of four convenient blocks of equal size or, alterna- 
tively, sequential performance in several ways. The notation and definitions of 
reference [1] are used in obtaining this design. 


2. The construction of the design. Consider in four dimensions (a) the 24 
points 


(+p, +p,0,0), (+p,0,+p,0), (+p, 0,0, +p), 
(0, +p, +p,0), (0, +p,0,+p), (0,0, +p, +p), 
which we shall denote by S(p, p, 0, 0); (b) the 8 points 
(+c,0,0,0), (0, +c,0,0), (0,0, +c,0), (0,0,0, +c), 
which we shall denote by S(c, 0, 0,0); (c) the 16 points 
(+a, +a, +a, +a), 


which we shall denote by S(a, a, a, a). 
Applying the formulae for the various excess functions (as defined in [1]}) 
for these sets, we find the results shown in Table I. 


Received December 9, 1959. 

1 This research was supported at the University of North Carolina by the United States 
Air Force through the Air Force Office of Scientific Research of the Air Research and De- 
velopment Command, under Contract No. AF 18(600)-83. Reproduction in whole or part 
is permitted for any purpose of the United States Government. 


875 








NORMAN R. DRAPER 


TABLE I 
The Excess Functions for Three Chosen Point Sets 


S(p, p, 0, 0) S(c, 0, 0, 0) | S(a, a, a, 6) 


Number of points > 8 16 
2c? 16a? 
2c* — 32a‘ 

0 
—224Aa* 
—32a* 





As is shown in [1], the combined set of points 


S = S(p, p, 0,0) + S(c, 0,0,0) + S(a, a, a, a) 
will form a third order rctatable arrangement if 
Ex(S) = Fx(S) = Gx(S) = H2z(S) = Ix(S) = 0. 
This will happen when, from Table I, 2c‘ — 32a* = 0, 12p* + 2c* — 224a° = 0, 
4p’ — 32a° = 0, i.e., when c’ = 4a’, p’ = 2a’. The radii of the three separate 


sets which comprise the arrangement are (2)*p, c, and 2a, all of which equal 2a. 
It follows [1] that the 48 points given by 


S(av/2, av/2, 0, 0) + S(2a, 0, 0, 0) + S(a, a, a, a), 


which we shall denote, as a whole, by S(a), form a singular third order arrange- 
ment for which \eA2/A, = (k + 2)/(k + 4) = 3. Thus S(a) cannot, by itself, 
be used as a design. Now consider two such arrangements characterized by 
S(a,) and S(a,). As has been shown [1], the combination of points S(a,) + S(a2) 
will, if a; # a, , form a third order rotatable arrangement which is non-singular 
and so may be used as a design. For the whole set, »N = 48(ai + a3), where 
N = 96 + mo and nm is the number of center points (if any) added. The condi- 
tion \2 = 1 then implies that at+a=2+ no/48. Thus, given a : 


0O<as1+ no/ 96, 


we have a rotatable design of third order in four dimensions given by the 
(96 + no) points of S(a;) + S(az), where a; is chosen from the range given 
above and a, = [2 + no/48 — ail}. 

Each of the four sets of 24 points given by S(a,(2)', a,(2)', 0, 0) and 
S(2a;,0,0,0) + S(a;, a;,a;, a;), (¢ = 1, 2), is itself a rotatable arrangement 
of second order in four dimensions. These are, of course, the second order ar- 
rangements most frequently used by experimenters for four factors. Hence the 
completion of the 96 point design provides an excellent way of proceeding from 
a second order investigation when the second order polynomial is found to be 
inadequate. More generally, any one, two, or three of the sets may be performed 
(with center points if necessary to satisfy the condition \4/A3 > k/(k + 2) = 3) 





3RD ORDER ROTATABLE DESIGN IN 4D 877 


as the first part of a sequential third order design. Alternatively, the four sets of 


24 points may be performed together as four separate blocks of a third order 
design. The complete design has parameters 


he = 48(a3 +.03)/N=1, = 382(at+ai)/N, dr. = 16(at + a3)/N. 


REFERENCES 
{1] Noeman R. Draper, “Third order rotatable designs in three dimensions,’’ Ann. Math. 
Stat., Vol. 31 (1960), pp. 865-874. 
[2] D. A. Garprner, A. H. E. Granpaae, anv R. J. Haver, “Third order rotatable de- 


signs for exploring response surfaces,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 
1082-1096. 





SOME ASPECTS OF WEIGHING DESIGNS 


By DamMarasu RAGHAVARAO 
University of Bombay 


1. Summary. In a previous paper [8] the author proved that the Py and Sy 
matrices are the most efficient weighing designs obtainable under Kishen’s 
definition of efficiency [5], when N is odd and N = 2 (mod 4) respectively, sub- 
ject to the conditions 

(i) The variances of the estimated weights are equal; 

(ii) The estimated weights are equally correlated. 

In this paper, assuming the a'»ve conditions, it is proved that the Py matrices 
are the best weighing designs under the definitions of Mood [6] and Ehrenfeld 
[2] when N is odd, while the Sy matrices are the best weighing designs under the 
definition of Ehrenfeld when N = 2 (mod 4). Under Mood’s definition of effi- 
ciency, the best weighing design X, when N = 2 (mod 4), is shown to be that 
for which X’X = (N — 2)Iy + 2Eww , where Jy is the Nth order identity matrix 
and Eww is the Nth order square matrix with positive unit elements everywhere. 
By applying the Hasse-Minkowski invariant, a necessary condition for the 
existence of the Sy matrices is obtained, and the impossibilities of the Sy matrices 
of orders 22, 34, 58 and 78 are shown. 


2. Introduction. Suppose we are given N objects to be weighed in N weighings 
with a chemical balance having no bias. Let 


xij = 1, if the jth object is placed in the left pan in the ith weighing; 
—1, if the jth object is placed in the right pan in the ith weighing; 
0, if the jth object is not weighed in the ith weighing. 
The Nth order matrix X = (z,;) is known as the design matrix. Also, let y; 


be the result recorded in the ith weighing; e,; the error in this result and w; the 
true weight of the jth object, so that we have the N equations 


(2.1) Taw, + ZL QWe + ae + LinWn = YE + €i, a = & 2, = >. a N. 


If X is non-singular, the method of Least-Squares or theory of Linear Estimation, 
gives the estimated weights (%#;) by the equation 


(2.2) ® = S'X'y, 


where y is the column vector of the observations, ® is the column vector of the 
estimated weights and S = X’X. 
If o’ is the variance of each weighing, then 


(2.3) Var() = Se” = (Ci;)e’, 
Received July 6, 1959; revised March 29, 1960. 
878 





ASPECTS OF WEIGHING DESIGNS 879 


where (C,;) is the inverse matrix of S. Hotelling [3] proved that the minimum 
minimorum of each of the estimated weights is o°/N. 

Mood considers as best that weighing design which gives the smallest corre- 
sponding joint confidence region for the estimated weights. Consider a set of 
confidence intervals C> for the parameter 6, typified by 5) , obeying the condition 
that P(d:C8@| 6) = a, where we write 5.C8, that is 55 contains 6. Let C, be some 
other confidence intervals for the parameter 6, typified by 4,, such that 
P(8,C@ | 6) = a. If now for every C; , we have, for any value @ other than the 
true value, P(dC@’ | 6) Ss P(5,C0 | 0), Co is said to be the smallest confidence 
intervals (cf. Neyman [7]). Hence a design will be called optimum in the sense 
of Mood if the determinant of the matrix (C;;) is minimum. But we know that 
the determinant |C,;| is minimum when the determinant |S] is maximum. Thus, 
the efficiency of a weighing design X can be measured, in the sense of Mood, by 


(2.4) det(S)/max.det(S). 


If Amin i8 the minimum of the distinct characteristic roots of S, then the effi- 
ciency of the weighing design X, can be measured, in the sense of Ehrenfeld, by 


(2.5) Amin/N. 


The conditions 

(i) the variances of the estimated weights are equal; 

(ii) the estimated weights are equally correlated are assumed throughout this 
paper. 


3. Most efficient designs when N is odd and N = 2 (mod 4) under the defi- 
nitions of efficiency of Ehrenfeld and Mood. With the conditions assumed in 
Section 2, the matrix S takes the form 


(3.1) (r — A)In + Eww. 

Now 

(3.2) det(S) = (r — d)* {fr + AN — 1)}. 
Since det(X) is real and non zero, we have 

(3.3) r>rA20,orr=N,r’ = —1. 


Therefore, in this paper we consider only those values of r and \ satisfying (3.3). 

Replacing r in (3.1) by (r — z) and equating the value of det(S) to zero, 
we get (r — dA) and {r + A(N — 1)} as the distinct characteristic roots of 8S 
with multiplicities (NV — 1) and 1 respectively when \ ~ 0. If \ = 0, r is the 
only distinct characteristic root and it has multiplicity N. In either case, among 
the distinct characteristic roots, (r — \) is always minimum except when r = N, 
\ = —1, in which case | is the minimum characteristic root. Hence from (2.5), 
we measure the efficiency of a weighing design X, satisfying (3.1) under the 





880 DAMARAJU RAGHAVARAO 


definition of efficiency of Ehrenfeld, by 
(r—r)/N, r>rA2Z0; 
1/N, r= N, A= —1. 
Using the method and Lemma 2.1 of [8] we can easily prove the following two 
theorems: 
THeEorem 3.1. For Ehrenfeld’s definition of efficiency the best weighing design X, 
when N is odd, is that for which 
(3.5) S = (N — 1)Iy + Eww. 


THEorEM 3.2. For Ehrenfeld’s definition of efficiency, the best weighing design X, 
when N = 2 (mod 4) and N = 2, is that for which 


(3.6) S = (N —1)Iy. 
If we let f2(r, \) be the value of det(S), we have the following Lemma. 
Lemma 3.1. Forr > \ 2 0, 
(i) fo(r, X) is a monotonic increasing function in r for a fixed d, and 
(ii) fo(r, X) is a monotonic decreasing function in d for fixed r. 
The Lemma can be easily proved by partially differentiating f.(r, \) with 


respect to r and A, and examining the signs of the derivatives. 
We now prove 


TueroreM 3.3. For Mood’s definition of efficiency, the best weighing design X, 
when N is odd, is that whose S is (3.5). 


Proor. Since max. det(S) is not known, we prove that det(S), where S is 
given by (3.5), is greater than det(S) for all other possible S. Now, 
(3.7) fo(N,1) —fx\N — 1,0) = N(N —1)*" > 0. 
Again 


fitr, d) = 


f(N,1) — f2(N, — 1) 
(N — 1)""(2N — 1) — (N +1)"" 


2N(N — 1)" — 2{Ne + € ye ak y 
2fw- vines ("5 ty wet +o +a} 
maf( Sars (OSs 4 EoD] 
afar tar a(" 5 = ("5") 
rto-9(2)- 9) 
3) - 


4 
f+ + Ntlw - 1) 


ya a)p tv = |. 





ASPECTS OF WEIGHING DESIGNS 


But 


(3.9) (N — 1) - . 7 > es > 


Hence the last expression of (3.8) is greater than zero and we have 
(3.10) FAN, 1) > SAN, —1). 


Also, we know from Lemma 2.1 of [8] that \ cannot be zero, since N is odd. 
Therefore from the inequalities (3.7), (3.10) and Lemma 3.1, we see that det(S) 
is maximum when S is given by (3.5). This completes the proof. 

TueoreM 3.4 For Mood’s definition of efficiency, the best weighing design X, 
when N = 2(mod 4) and N # 2 is that for which 
(3.11) S = (N — 2)In + 2Ewyw. 


PRoor. 
f(N, 2) — fa(N a 1, 0) 
(3.12) {(N — 2)""(3N — 2) — (N — 1)"} 


(N — 1)"[{1 — 1/(N — 1))°{3 + 1/(N — 1)} — U. 
Considering, the inequality 


(3.13) t < —log(1 — t) < t/(1 — 8), 0<t<1, 
and substituting t = 1/(N — 1), we can easily show that 
(3.14) {1 — 1/(N — 1)}*" > Exp{—(N — 1)/(N — 2)}. 
Making use of (3.14), (3.12) is greater than zero, if 
(3.15) 3 Exp{|—(N — 1)/(N — 2)} —1>0. 
We easily see that (3.15) is true for N > 11. We also see that f.(N,2) > 
fx(N — 1,0) for N = 5, 6, 7, 8, 9, 10, 11 by actual substitution. Thus, we have 
(3.16) f(N,2) >fe(N — 1,0) for N 2 5. 
The only value of N s 4 and = 2(mod 4) is 2, and in this case we know that 
the Hadamard matrix provides the optimum weighing design. Hence, if we delete 
this case we see that f.(N,2) > f2(N — 1,0) when N = 2(mod 4). 

We know from Lemma 2.1 of [8] that \ cannot be equal to +1 whenr = 
N = 2 (mod 4). Also, as no Hadamard matrix exists in this case, \ cannot be 
equal to zero when r = N. Thus, from Lemma 3.1 and the inequality (3.16) we 


see tnat the det(S) is maximum when S is given by (3.11). Thus the theorem is 
proved. 


The proof of the following theorem is similar to that of Theorem 3.1 of [8]. 
TueoreM 3.5. A necessary condition for the existence of a weighing design X 
satisfying (3.11) is that 
(3.17) N = (4 + (3f° + 4)4/3, 
where f is an integer. 





882 DAMARAJU RAGHAVARAO 


An application of the above theorem shows that the weighing design X satis- 
fying (3.11) exists only for N = 6 and N = 66 out of all N < 200 
and = 2 (mod 4). 

For N = 6, the best weighing design X satisfying (3.11) is 


—-1 -1 -1 -1 —-1 
—| 1 1 1 1 
1 —1 1 1 1 
1 l —1 1 1 
1 1 1 —1 l 
eS ee aoe 
If we adopt the above design to weigh 6 objects, 
Variance of each estimated weight = 70/32, and 


(3.19) 
Covariance of each pair of estimated weights = —o’/32. 


4. Some known results about the Legendre symbol, the Hilbert norm residue 
symbol and the Hasse-Minkowski invariant. The Legendre symbol (a/p) is 
defined for odd primes p as 


+1, if a is a quadratic residue of p; 
(4.1) (a/p) = 


\—1, if a is a non quadratic residue of p. 


A slight generalisation of the Legendre symbol is the Hilbert norm residue symbol 
(a, b),. If a and b are non zero rational numbers, we define (a, b), to have the 
value +1 or —1 according as the congruence, 


(4.2) ax’ + by? = 1(mod p’), 


has or has not for every value of r, rational solutions z, and y,. Here p is any 
prime, including the conventional prime p. = ©. Many properties of (a, b), 
are given by Jones [4] and Shrikhande [9]. 

Let A = (a;;) be any n X n symmetrical matrix with rational elements. The 
matrix B is said to be rationally congruent to A, written A ~ B, provided there 
exists a non singular matrix C with rational elements such that A = CBC’, 
where C’ is the transpose of C. If D; (i = 1, 2, --- ,n) denotes the leading princi- 
pal minor determinant of order 7 in the matrix A, then, if none of the D; vanish, 
the quantity 


n—l 


(4.3) Cy a C,(A) = (-1, —D.)» TI (D;, —Dis1)p 


is invariant for all matrices rationally congruent to A. C, is known as the Hasse- 
Minkowski invariant. 


The following Lemma, given by Bose and Connor [1], will be of use for the 
next section. 


Lemma 4.1. Jf t is a rational number and A,, = tl, , then, 
(4.4) C,(An) = (—1, —1)>,(t, —1) ron, 





ASPECTS OF WEIGHING DESIGNS 883 


5. On the impossibilities of the Sy matrices. Since the Sy matrix is a square 
matrix with rational elements and Det(Sy) # 0, its inverse exists and is also 
a matrix with rational elements. Thus, Jy = (Sy’)(SySw)(Sy'). From the 
last section, we see that Jy and SySy are rationally congruent and they can be 
written SySy ~ Iy. Hence 
(5.1) C,(SwSw) = Cy(Iw) = (—1, —1)s. 

But 


(5.2) SwSye= (N — 1)Iy. 


From Lemma 4.1, we see that 

(5.3) C,(SySw) = (—1, —1),(N — 1, —1)3%*?". 
But, as N = 2 (mod 4), N(N + 1)/2 is odd and (5.3) reduces to 
(5.4) C,(SwSw) = (—1, —1),(N — 1, —1)>. 


Equating the right hand sides of (5.1) and (5.4), we have for all primes p, 
(5.5) (N — 1, -1), = +1. 


This result can be stated in the form of the following theorem. 

Tueorem 5.1. A necessary condition for the existence of the Sy matrix where 
N = 2 (mod 4) is that (N — 1, —1), = +1, for all primes p. 

ILLUSTRATION 5.1.1. When N = 22, 


(N —1,-—1), (21, —1), = (3, —1),(7, —1)5 
= —1,forp = 3. 


The Theorem 5.1 is violated and S» does not exist. 
The non existence of Sy, Ss and Sz. can also be easily shown by applying 
Theorem 5.1. 


Acknowledgments. My sincere thanks are due to Professor M. C. Chakrabarti 
for his kind guidance in preparing this paper. I am also thankful to the referee 
and the Editor for their suggestions on the original manuscript of this paper. 


REFERENCES 


[1] R. C. Bose anv W. 8. Connor, ‘‘Combinatorial properties of group divisible incomplete 
block designs,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 

[2] Sytvain Enrenre.p, “On the efficiencies of experimental designs,’’ Ann. Math. Stat., 
Vol. 26 (1955), pp. 247-255. 

[3] Harotp Hore.uina, “Some improvements in weighing and other experimental tech- 
niques,”’ Ann. Math. Stat., Vol. 15 (1944), pp. 297-305. 

[4] B. W. Jonus, The Arithmetic Theory of Quadratic Forms, John Wiley and Sons, New York, 
1950. 

[5] K. Kisuen, “On the design of experiments for weighing and making other types of 
measurements,’’ Ann. Math. Stat., Vol. 16 (1945), pp. 294-300. 





884 DAMARAJU RAGHAVARAO 


[6] ALexanpER M. Moon, ‘‘On Hotelling’s weighing problem,’’ Ann. Math. Stat., Vol. 17 
(1946), pp. 432-446. 

[7] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 
of probability,’’ Phil. Trans. Roy. Soc. London, Vol. 236 (1937), pp. 333-380. 

[8] Damarasu Racnavarao, “Some optimum weighing designs,’’ Ann. Math. Stat., Vol. 
30 (1959), pp. 295-303. 

{9} 8. S. Suarkaanpe, “The uniqueness of the LZ, association scheme,’’ Ann. Math. Stat., 
Vol. 30 (1959), pp. 781-798. 








RANDOM ALLOCATION DESIGNS I: ON GENERAL CLASSES 
OF ESTIMATION METHODS' 


By A. P. Dempster 
Harvard University 


1. Summary. Certain linear estimation procedures for randomized experi- 
mental designs are evaluated relative to the criteria of bias, variance and mean 
square error. For the designs considered, treatment combinations are randomly 
allocated to experimental units, the randomness being subject only to a wide 
symmetry condition. Statistical properties refer to the discrete probabilities in- 
duced by the randomization hypothesis. Section 2 defines the basic statistical 
model and discusses the question of conditional inference relative to this model. 
Certain vectorial notation and terminology is introduced in Section 3. Although 
the theory of the paper applies directly to k-factor designs with general k, the 
notation is set up in Section 3 for a three factor design, and the three factor nota- 
tion is used throughout, except for Section 5 which discusses an even simpler 
example. Two general classes of linear unbiased estimators are defined in Section 
4 and illustrated in Section 5. In Section 6 it is shown that estimators of the 
types defined in Section 4 have optimum properties in a wide class of linear es- 
timators. Finally, the theory for the basic model is generalized in Section 7 to 
cover the case of observations with error. 

Formal proofs of stated theorems are to be found at the ends of Sections 4.2, 
4.3 and 6. 


2. Introduction. Consider a completely crossed k-factor design with FR levels 
of factor 1, C levels of factor 2, --- , L levels of factor k. In recent years F. E. 
Satterthwaite [6] has proposed designing experiments by drawing at “random’”’ 
from such an array, usually with some restrictions on the random choice, plus 
some replication, and then performing the experiments indicated by the random 
choice of treatment combinations. This has been termed “random balance ex- 
perimentation” or “random allocation experimentation.” For example an ex- 
periment with n observations could be designed by choosing independently n 
sets of treatment combinations each according to the following simple rule: the 
level of factor 1 is chosen at random from the FR possibilities, independently a 
choice of one of the C levels of factor 2 is made, and so on. The general technique 
appears to have been introduced primarily for application to very large arrays 
where only a very small fraction can be contemplated, and so is often thought 
of as a competitor of highly fractionated factorials which deliberately confound 
certain effects with others. In the case of random allocation designs, confound- 





Received May 25, 1959; revised June 27, 1960. 

'This research has been supported in part by the United States Navy through the Of- 
fice of Naval Research under contract Nonr 1866(37). Reproduction in whole or in part 
is permitted for any purpose of the United States Government. 


885 














886 A. P. DEMPSTER 


ing is random, or at least partly random. The proper place in practice of random 
balance designs relative to more conventional designs is a controversial matter 
(see [1], [3], [6], [10]). The results presented in this paper were motivated by the 
need for a theoretical framework within which random balance designs could 
be compared and evaluated relative to more conventional designs. The results 
pertain, however, to a general class of models which can be applied not only to 
the random balance experiments referred to above but also to a wide range of 
other designs including (i) fixed fractions where the labels of the different levels 
of each factor are assigned at random, (ii) arrays which are complete except for 
randomly selected ‘missing cells,” and (iii) conventional ‘completely randomized 
designs” as discussed in [9]. Subsequently in this paper the term random allo- 
cation will be used rather than random balance, partly because the wider appli- 
cability than simply to Satterthwaite’s random balance designs makes a more 
neutral term desirable, and partly because the term random allocation seems to 
be more descriptive. 

The basic statistical model is as follows. Corresponding to each of the N = 
RXC xX --+ X L cells of the complete k-way design there exists a fixed quan- 
tity to be thought of as the result of an experiment performed with the corre- 
sponding factor level combinations. By a design we mean a subset of n of the N 
factor level combinations, and an experiment performed using a given design 
provides the values of the n fixed quantities corresponding to the n cells of the 
design. For the mathematics of this paper a random allocation scheme is a method 
of selecting a design as a random subset of n of the N possible factor level com- 
binations where the only restriction on the probability of selection of a par- 
ticular set of n is that all other sets of n obtained from a given set by permu- 
tation of factor levels shall have the same probability of selection. It will be 
convenient to use group-theoretic language in dealing with this definition of 
random allocation. Suppose @ denotes the group of all permutations of the levels 
within all factors, so that G has order 


P = (R!)(C}) --- (LY). 
By applying all the elements of G to a given design one obtains a set of designs 


which we call a symmetry class of designs under group G. In this way all (*) 


possible designs are classified into mutually exclusive symmetry classes. Our 
definition of a random allocation scheme states that all the designs of any one 
symmetry class must be equiprobable, but no restriction is placed on the prob- 
abilities of selection of the different symmetry classes. The different weights 
allowable for different symmetry classes result in the wide range of possible types 
of designs alluded to above. Possibly the simplest example occurs when the n 
selected combinations are a simple random sample without replacement from 
the N possible combinations, and this we call simple random allocation. Two 
types of modifications of simple random allocation which may appear separately 
or in combination may be termed random allocation with partial balance and 
random allocation with partial confounding. An often recommended example of 








RANDOM ALLOCATION DESIGNS. I 887 


the first type is defined by the restriction of the random choice so as to require 
for each factor that each of its levels appear an equal (or as near equal as possible) 
number of times. It is clearly possible to extend this technique to balancing with 
regard to combinations of factors rather than simply with regard to factors one 
at a time. For example, in an experiment of size n = 32 on an array with 20 
factors each at 2 levels, one could divide the factors into 4 groups of 5 each and 
balance the experiment with regard to 4 complete 2° experiments, i.e., the ex- 
periment would be a complete 2° in 4 ways, the 4 ways being randomly superposed 
on one another. Partially confounded random allocation, as defined here, would 
arise if one were to choose a fixed fractional factorial from the whole array where 
certain effects are deliberately confounded, subsample at random in some sense 
from the fixed fraction, and finally randomly permute the labels of rows, col- 
umns, etc., by choosing at random a member of @. Clearly partial balance can 
be built into the second of the three stages of choice of a partially confounded 
random allocation design. Alternatively, if the second stage of choice is omitted, 
partially confounded random allocation includes any complete standard frac- 
tional factorial provided that the experimenter has seen fit to randomly permute 
the labels of factor levels independently for each factor. Thus any comparison 
between, say, simple random allocation and a prechosen-then-randomized frac- 
tion can be made entirely within our class of random allocation designs. A final 
example, to show the breadth of our definition, is the class of completely ran- 
domized designs where ¢ treatment combinations are applied at random to t 
experimental units chosen at random from r 2 t experimental units. This is 
clearly a random allocation design where the ¢ treatment combinations are re- 
garded as the ¢ levels of one factor and the r experimental units are regarded as 
the r levels of a second factor. 

A generalization of the basic model will be considered in Section 7. The theory 
extends immediately to this generalization, but in the interest of clarity the 
main presentation will use the basic model. The generalization allows the fixed 
array to become random through the addition of a random error with zero mean 
and arbitrary variances and covariances. The generalization can be made to 
cover certain methods of design where some of the cells are replicated in the 
design. 

Our aims are to provide, for the basic statistical model, methods of estimating 
linear combinations of the N fixed cell-values, and to search for optimum methods 
of estimation. The same discussion will apply to all of the types of random allo- 
cation schemes within our definition. The criteria for good estimators will be the 
usual criteria, “unbiasedness’’, “unbiasedness with minimum variance” and 
“minimum mean square error.” These criteria have been placed in quotes to 
emphasize that they are not yet well-defined and indeed that they may be de- 
fined in several ways. Controversy over what statistical properties may be prop- 
erly associated with randomized designs has a long history as may be seen in the 
opposing contentions of Fisher and “Student” [2], [8], and such issues do not 
appear to be definitely resolved even today. The question usually comes down 
to: how conditional should the inference be? One point of view is that, having 








888 A. P. DEMPSTER 


made our design random, it is only sensible to use this randomness, by averaging 
over the random choice of design, when defining statistical properties like means 
and variances. The opposing point of view states that the randomness in the 
choice of design does not depend on the unknown quantities to be estimated, 
i.e., that the chosen design is an ancillary statistic in the sense of Fisher [5], 
and so we must make inferences conditional on the design actually used. Curi- 
ously enough the two opposing principles, namely the principle of basing in- 
ferences on the random properties of randomized designs and the principle that 
one must make inferences conditional on ancillary statistics, are both asso- 
ciated with the name of Fisher. The author believes the latter principle to be 
desirable in theory but not always practicable. In the case of our basic mode] 
where n of N fixed constants are provided by the random allocation scheme, to 
condition on the design is to eliminate all randomness from the model. There are 
two methods of restoring randomness to the model. The first method, and the 
one adopted for our theory, is to relax the conditioning requirement. The second 
method is to assume a more structured model for the observations; for example 
we might assume a model I fixed effects analysis of variance model with fewer 
than n fixed effects to be estimated. The choice of method poses a dilemma, for 
the logically more satisfying second method may yield incorrect results because 
the more structured model makes incorrect assumptions. 

In this paper we develop theory for the general unstructured model and take 
averages over the random choice of design. We can also, however, take one step 
towards conditioning on the design and condition on the observed symmetry 
class of designs, i.e., rather than average over all designs we can just average 
over the observed symmetry class. This conditioning is equivalent to the as- 
sumption that the random allocation scheme used has just the observed sym- 
metry class with probability one. Since all of the designs of one symmetry class 
have similar confounding patterns the procedure of conditioning on the ob- 
served symmetry class has the intuitive appeal of averaging only over designs 
with confounding patterns similar to that observed. In any case we will be deal- 
ing with estimators which are unbiased in one of two senses: they may be (i) 
U,-unbiased, i.e., conditionally unbiased given the symmetry class, or (ii) U2- 
unbiased, i.e., unbiased under averaging over symmetry classes as well as within. 
Clearly any U;-unbiased estimator is also U;-unbiased. 

The consequences of adopting the method of averaging over designs may at 
first appear startling. For example it becomes possible to find unbiased estimates 
of every linear combination of the N fixed quantities for every random alloca- 
tion design, even with n = 1. Suppose the N fixed quantities are denoted by 
UV; , U2, *** , Uw. Then a random allocation experiment with n = 1 amounts to 
observing one of the v; chosen at random. For each i an unbiased estimator of 
v; can be defined as 


i= Nv; if v; is observed 


=0 otherwise. 








RANDOM ALLOCATION DESIGNS. I 889 


Consequently an unbiased estimator of }~%_, aw, is Dent a#é,. Of course, such 
an estimator based on n = 1 would have so large a variance that it would be 
useless, but still it retains theoretical validity. 


3. Terminology and notation. Although the theory applies directly to de- 
signs with any number of factors, suppose, to simplify notation, that we use as 
prototype a design with three factors having R, C and L levels and so N = RCL 
cells altogether. Suppose the cells have associated numbers 


Vin (t = 1,2,---,R j= 1,2,-+-,Cj;k = 1,2,--+,L) 


to be regarded as fixed quantities or parameters. The quantities v,;,., together 
with all linear combinations of these quantities, may be taken as the values 
assumed by a particular linear functional f, over an N-dimensional vector space 
E. The vector space £ is defined abstractly in terms of N basis vectors 


Vin (t = 1,2,---,R;j = 1,2,---,Cjk = 1,2,-:-,b), 


which are in one to one correspondence with the cells of the basic array; and 
f., called the total functional, is defined by 


KX aijnVin) = ae i jeV igh « 
t.3% tj k 


In particular f,( Vij.) = vs for all i, 7 and k. The random allocation experiment 
provides observations for a random subset of n of the N cells. Suppose the corre- 
sponding n vectors V;;, span subspace E, of FE. Note that the experiment pro- 
vides the values of f,(V) for V ¢ EZ, only. 

An alternative method of introducing vectorial terminology would be to regard 
the v,; as defining an N-dimensional vector. Such a vector lies in the dual space 
(see [7]) of the vector space E introduced in the preceding paragraph. Since 
we wish to work with vectors in E we prefer the terminology which calls the set 
of v;j a linear functional rather than a vector. 

A Euclidean metric may be inserted in FZ by regarding each V; to be a unit 
vector orthogonal to all other Vi, , i.e., the metric is such that the vector 


Diss oinVin 


has squared length >>, ;4 a1. This metric will be referred to throughout as 
the formal metric and, unless otherwise stated, orthogonality relationships and 
lengths will be relative to the formal metric. 

In accordance with standard analysis of variance ideas, the space E can be 
expressed as the direct sum of eight mutually orthogonal subspaces, Ey , Zz ,Ec, 
E., Erxc, Ext, Ect and Ege, , of dimensions respectively 1, R — 1, C — 1, 
L—1,(R — 1)(C — 1), (R — 1)(L — 1), (C — 1)(L — 1) and 


(R — 1)(C — 1)(L — 1). 


For example, the space Epc consists of all vectors: )-..;2a:aVia such that 
analysis of variance of the array a; produces zero mean squares for all effects 








890 A. P. DEMPSTER 


except possibly the RC interactions. Similar definitions apply to the other 
subspaces. 


A class of metrics may be introduced in E by stretching or shrinking the space 


along the subspaces Ey , --- , Exc: . More precisely, suppose V ¢ F is expressed 
as 


V= auV xu + arVe +--+ + aretVect 


where Vye Eu,--: , Vecr€ Erci and these are unit vectors according to the 
formal metric. Then, according to the \-metric defined by (Aw, Ax, °° 
V has squared )-length 


-,Arcr), 


aud + -+* + agcAdeer - 
Note that the formal metric is the particular \-metric where 
hu = Ae =: 
Metrical properties relative to a general \-metric will be referred to as \-proper- 
ties, e.g., a vector has a d-length or a pair of vectors have a d-angle. 


4. General classes of estimation methods. 

4.1. Motivation. Our purpose is to find unbiased estimators of f,(V) for any 
V based on data giving the values of f,(V), where V belongs to the random sub- 
space E,. The values of f,(V) for V belonging to one of the subspaces 


Euy,-+:, Ere 


have special interpretations and are of special interest. For example a typical 
V é€ Er is 


] 
Vo = CL » (Vi, jn a Visine) 


and its associated f, value, namely 


1 
vo = CL yy (v4, jx — Visit), 


is of special interest as the difference of two row main effects. A brief discussion 
of methods of estimating v will help to motivate subsequent general methods. 

The first unbiased estimator of v) which comes to mind is probably the differ- 
ence of two means, the mean of those v; which were observed with i = 1, 
and the mean of those with i = 7, . Provided that each possible design yields at 
least one observation in every row, this estimator is evidently both defined and 
U,-unbiased for any random allocation scheme. If, however, it is suspected that 
large column effects are present, and if the design permits, one would probably 
apply the foregoing method to the data corrected for column main effects. In 
both cases these estimators are values of f, for vectors in E,, in the first case 
for a vector in E, which is (i) perpendicular to Ey , and (ii) perpendicular to 








RANDOM ALLOCATION DESIGNS. I 891 


that part of E, orthogonal to Vp, and which makes the smallest angle with Vo 
subject to conditions (i) and (ii). In the second case a third condition must be 
added: (iii) perpendicular to E-. Both of these methods and their natural ex- 
tensions are discussed by Anscombe [1]. The first method avoids any influence 
of the grand mean on an estimate, whereas the second method avoids any in- 
fluence of the grand mean or column effects. Extensions to the correction for 
other effects are evident. This example is intended to illustrate the following 
heuristic viewpoint: in estimating an effect vp one will use f,(V) for a vector V in 
E, chosen as a compromise between being near V», and not too near other direc- 
tions with large associated effects. In the example, the device for keeping away 
from dangerous directions is to require the used direction to be orthogonal to 
dangerous directions. 

A generalization of this device with much greater flexibility is to use a vector 
in E, which makes minimum )-angle with V» for \-metric (Aw, --- , Awcz). The 
two methods described above may be shown to be limiting cases of this general 
method where 


(a) Ay = Ar => an and Ac = Arc meee = Arci = | 
and 
(b) Xw = Ae = Ac— © and Age = «++ = Agce = 1 respectively. The general 


method was motivated by the belief that it would be better to stop between 
these extremes. In fact the author’s heuristic feeling led him to conjecture that 
one should stretch (or shrink) the metric until the effects corresponding to 


d-unit vectors in the directions Ey ,--- , Exc, are in some sense equalized, or 
more specifically until 
uw = (MS), ie = (MS)e,°*:, Arce = (MS) ect 


where the (MS) values are the mean squares appearing in an analysis of variance 
table for the complete array v; . The sense in which such a procedure is optimum 
will be indicated in Section 6. 

There is one direction in E, \-nearest to V>, but there are many vectors in 
this direction. The question of which to use in defining the estimator has been 
up to now mostly ignored in our discussion. We now proceed to more precise 
definitions of estimators. 

4.2. \-minimum extensions. The objective is to find an unbiased (U,- or U;-) 
estimator of f:(V) for all Ve Z. We shall consider only estimators which are 
themselves linear functionals over E, i.e., if f,(V,) estimates f,(V,) and f,( V2) 
estimates f,(V;) then the estimator of f,(aV: + 8V2) is af:(V:) + Of V2). The 
estimators which we shall consider are conveniently expressed in terms of a 
random linear functional denoted by f,(V) and called the \-minimum extension of 
f. from E, to E. 

The linear functional f, is determined by the observations and the \-metric 
au oe Arcz)- It is defined by 


f(V) = fAV) for Ve E, 
= 0 for V d-orthogonal to EZ, . 





892 A. P. DEMPSTER 


These two statements define f, completely, since any Ve FE has a unique ex- 
pression as V; + V, where V; ¢ Z, and V, is \-orthogonal to Z, and so 


ACV) = fACVi) + ACVe) = fC Vi). 


It is clear that an alternative characterization of f, is to define f,(V) for any 
VeE to be f.(V:) where V; is that vector in E, at minimum )-distance from V. 
This characterization ties in with the description in Section 4.1 of estimators 
of vo as f, values for vectors in EZ, nearest to Vo . 

A third, and quite different, characterization of f, is as follows. The values of 
fx for Vis corresponding to the unobserved cells are those numbers which, together 
with the known f,( = f,) values for Vi, corresponding to the observed cells, produce 
that full array which minimizes the expression 


(SS)u , (SS)e . |, 4 (S8)acx 


Ni Ne Nact 
where (SS)u,-+:, (SS)ecz are the sums of squares arising from the analysis of 
variance of the full array. To prove that this characterization agrees with the 
first given, consider the class F, of linear functionals which agree with f, for 
Ve£, but are otherwise arbitrary. Pick any basis of EF consisting of \-unit 
and )-orthogonal vectors W;, W:,---, Ww and consider for any f¢F, the 
property 


S(f) = DUwo!, 


which may easily be checked to be independent of the choice of basis. According 
to the first definition, f, is that member of F, which is zero over the subspace 
E, of E where £, consists of those V ¢ E d-orthogonal to E, . In terms of a basis 
W:,--:, Ww such that W,,---, Wc EF, and Way,--:, Wwe, it is seen 
to be equivalent to say that f, is that member of F, which minimizes S(f). On 
the other hand, when a basis W: , --- , Ww is selected which lies entirely within 
the subspaces Ey , Ex, --- , Exci , it becomes evident that 


(SS) a + (SS)e + si S + (SS) peer 


Nv Ne Ager 


S(f) = , 
as required. 

4.3. Unbiased estimators. For a given random allocation scheme and a given 
A-metric (Aw, Anz, *** , Arce) we define two unbiased estimators of f,( V), called 
the class 1 estimator and the class 2 estimator. The class 1 estimator is U;-un- 
biased and the class 2 estimator is U:-unbiased. The definitions of these esti- 
mators rest on the following theorem which is proved at the end of this section. 

TuHeoreM. Suppose A is a generic symbol representing one of the subscript 
combinations M, R,--- , RCL. Suppose the symbol “ave { ---}” denotes average 
over the random choice of design of a random allocation scheme. Then there exist 








RANDOM ALLOCATION DESIGNS. I 893 


constants yu, Yr,°***, Yrxc., depending in general on the random allocation 
scheme, such that 


ave {fC Va)} - vahi( Va) 


for any Vs ¢ Es and for all A. Further, if \u , «++ , \xcx are all finite and non-zero, 
then yu ,*** 5 Yeex are also all finite and non-zero. 

The definition of the class 2 estimator follows immediately: any Ve EF has a 
unique expression as 


V= >V, where Vie Es 
4 


and, from the thearem, the class 2 estimator defined as 
pA(V) = x va fi( Va) 


is U,-unbiased for f,(V) for all Ve £. The definition of the class 1 estimator 
follows from an application of the theorem to the random allocation scheme 
which restricts designs to a single symmetry class G under @. This results in a 


set of constants ya¢ for each symmetry class G. Then the class 1 estimator is 
defined as 


h(V) = 2D vaofi(Va) 


for any V = 2,4V,e¢£, where G denotes the observed symmetry class. The 
class 1 estimator is clearly U,-unbiased. In general the class 1 and class 2 esti- 
mators are different and the class 2 estimator is not U;-unbiased. Notice, however, 
that if the random allocation scheme allows only one symmetry class with prob- 
ability one, then the class 1 and class 2 estimators coincide as do the concepts 
of U,- and U,-unbiasedness. 

To prove the theorem stated above we shall express ave {f,(Vi.)} successively 
in terms of the following three sets of eight quantities each: 


(Oca nce » Vee » VDI) » *** » Vigne) 
(UV... 5 Vee Vigey °°* » Vega) 
(mM... , Mz. 5 Mz. *** 5 Min) 
The first set represents a method of breaking 2,,,:0,/.; into 


%ODw® = > Vret 5 
ree 


sei 
tk 


Veni) ** 2X Mat 
oni 
tk 








894 A. P. DEMPSTER 


etc. In the second set a dot means that a mean is taken over the corresponding 
index, e.g., 


1 1 
vu... = RCL 2, Ur, i.. = CL » Vist, 


etc. The third set gives the standard representation of the observation uv; in 
terms of its mean effect, row effect, etc.,: 


M... = V..., 


My. = Vi. — Vi. , and so on to 
Mijk = Vijk — Vij. — Vik — UV. jR + V;.. + v. j. + Fic Dove o 


It is easily seen that any one of these three sets can be expressed as linear func- 
tions of any other of the three, where the coefficients do not depend on the par- 
ticular i, 7 and k involved. Thus 


v... = (1/RCL) (oyna + Vina + +++ + viz), 
vy. = (1/CL) (vicnay + visay + Vicne + Vege), 


etc. 

Now ave {f,(Vi,.)} is evidently expressible as a certain linear combination of 
the v,,, . The symmetry under @ of the defining property of all random allocation 
schemes implies the equality of those coefficients in this linear combination 
corresponding to v,. values comprising a particular sum of the eight sums 
(vincpa » Vena, *** » Vise). Hence there are just eight different coefficients. 
The symmetry further implies that these eight coefficients do not depend on 
i, 7 and k. Going over to m-quantities, it follows that there exist constants 
Ym, Yr, °*** » Yrcx independent of 7, 7 and k such that 


ave {fi( Vise)} = yum... + yams. + ++ + YeciMip - 


By taking linear combinations of both sides it follows that 


ave {fr Va)} = yafi( Va) 


for any VaeF,. 

Since f,(Va) = f.(V) for that V « FE, which is \-nearest to V, , the contribu- 
tion to yaf:(Vs) from any particular Z, must be a positive or zero multiple of 
f( Va). Thus ys = 0 only if all Z, are \-orthogonal to V4 , which is in turn 
possible only if all V; are \-orthogonal to V4, . Since V, is a linear combination 
of Vij this last cannot happen unless the A-metric degenerates e.g., by some 
\-values becoming zero or infinite. This completes the proof of the theorem. 

In degenerate cases where y, = 0 for one or several A our estimators are un- 
defined. Such cases can be of practical interest. In a later paper [4] we shall 
discuss our estimators when several \, tend to infinity and relate our estimators 
to well-known least squares estimators. The non-existence of our estimators 








RANDOM ALLOCATION DESIGNS. I 895 


can in these cases be related to the inability to estimate by least squares more 
parameters than there are observations. 


5. Simple examples. No mention has yet been made of the difficulty of com- 
putation of general estimators of the kinds just defined. To compute f, for a 
given set of observations and a given arbitrary \-metric in general requires that 
the n observations be orthogonalized with respect to the A-metric. Further, no 
general formula has been found for the correction factors y, for unbiasedness, 
which are different for different random allocation schemes and, as the theory 
now stands, must be computed directly for each scheme. In this section we 
illustrate the computation of estimates in the case of simple random allocation 
designs with n = 2 and n = 3 from an underlying array with two factors at two 
levels each, i.e., N = 2°. The complexity of the general method is such that even 
these very simple examples are not quite trivial, anc so they serve to illustrate 
the definitions of Section 4. Our objective is to find expressions for the class 1 
and class 2 estimators and for their variances. 

For the underlying 2° array, the four fixed numbers v;;(i = 1, 2 and j = 1, 2) 
define f, for the four unit orthogonal basis vectors V,,; of E. The vectors 


Vu = 4(Vun + Vis + Va + Vex) 
Ve = 4(Vun + Vi — Vn — Ver) 
Ve = (Vu — Viz + Van — Ver) 
Vac = (Vu — Vio — Vn + Ver) 


define unit vectors in the directions of mean effect, row effect, column effect 
and row-column interaction effect. These directions define the one-spaces Ey , 
Ex , Ec and Exc respectively. The corresponding f, values vy , vg , Uc and Vege can 
be similarly expressed in terms of the v,;;. We shall denote class 1 and class 2 
estimators of vs by 04 and 0, , respectively, for each A. 

Take first the case where the observations are a random sample of three from 
the four possible cells. Here there are four possible samples of three, each with 
probability one quarter. Since all four possible samples belong to one symmetry 
class, class 1 and class 2 estimators are the same. Suppose A-metric (Aw, Aw, 
Ac, Arc) is adopted and we set out to find f,(V.) using the second characteriza- 
tion of f, given in Section 4.2. Thus we seek V « FE, \-nearest to V, . This need 
only be done for one choice of A, say R, and one sample of three, say vy , v2 and 
Vm , since it will follow by symmetry for other choices. Let us therefore find the 
vector 44 Vn + a@2Vi2 + Gu(— Vn) subject to the restriction ay, + ay + aq = 1 
which makes the smallest \-angle with V, . Each of Vi, Vie and — Vx has the 
same )\-inner product with V, and hence this same inner product is shared with 
QnViu + @2Vie — OnVn where ay, + ay + Gx = 1. Thus to minimize the d- 
angle of this vector with V, we need only minimize its \-length. Its (A-length)’ is 
found from summing the properly weighted (component)* along Vw, Ve, Ve 








896 A. P. DEMPSTER 


and Vrc , to be 
Nv (ais + a2 — an)* + Ak(Qu + aie + an)’ 
+ rE(an — a2 — an)? + Agc(an — a2 + an)’. 


It is easily seen that this expression is minimized when 


I a a 
a = K (x5 + -) on = K (sr +52),0n = K (+55), 


where K is a normalizing constant. By symmetry we could now write down the 
coefficients for each of the other three possible samples, with the same normalizing 
constant K. In order to define the class 1 or class 2 estimator 6, it is only necessary 
to choose factor K to produce unbiasedness. By direct algebra this K is found 


spe 
"ee ak te 
.* (x +5 +3) 


so that 


. 1 1\"T/1 1 
= (= 2 + + : (x + x) Vu 
+ Ge +g) Ge +i) ] 
Ne NE) ANE Neo) 


when 01; , V2 , and vy; are observed. Similar expressions could be written for the 
other three possible samples. Also by direct algebra we find 


vu Ure 1 l 3 
var) = (E45 +585)/ [Ge tae +x.) | 
The various symmetries permit deduction of the corresponding estimators 
Ou , 0c and Ogc . For example, formulas for the case of vy , v2 and v_, observed are 


Hest) Eterm Grd=) 
bebe) [Geb -Gnt +a) 


eet ree Peek ye fp ae. Be: i Bei ih 
Ure = x + Et xt iE "1 — wt ve — x + Va |. 


Corresponding formulas for var (04), var (®¢) and var (fgc) could be immedi- 
ately written down. 


If var (6) is minimized over different choices of \-metric (Aw, --- , Aec) it is 








RANDOM ALLOCATION DESIGNS. I 897 


easily seen that the resulting minimum is 


occurring when 


[Wael Hel |eacl | 


Au Ac Arc 
Thus, by symmetry, the choice of a \-metric such that 


[earl al _ lel _ [acl 

Au Ar Xe Arc 
minimizes var (6,5) for all A. This result agrees with the general theory of Section 
6 and the heuristic feeling of Section 4.1. It is to be noted that this optimum is 
not available as a practical method since the choice of A-metric depends on 
unknown quantities. 

To illustrate the difference between class 1 and class 2 estimators we discuss 
the case of samples of two from the 2° array. Here, in the case of simple random 
allocation, there are six equiprobable pairs of observations defining three sym- 
metry classes of size two each, say G, , G, and G; : 


Gi: (vu, 0%) and (vm , v2) 
G2: (tu ,0m) and (2, v2) 
G,; : (on ’ V22) and (v2 ’ Un). 


If a \-metric (Aw, «** , Axc) is selected and we seek a V ¢c E, making minimum 
h-angle with V, we find that in the six cases such vectors are: 


Vu + Vie = Ve + Vu, —Va — Vn = Ve — Vw 
Vu — Va Ve + Vic, Via — Vez = Va — Vie 
Vu — Vn = Ve + Ve, Vio — Vn = Ve — Ve. 


In any scheme of weighting these to produce an unbiased estimator of vz it is 
evidently necessary to weight the members of each pair of a symmetry class 
equally, so the question is how to choose three coefficients a, 8 and y where 
the resulting estimator is: 


a(vy + 02) = a(ve + vw), a(—vn — Um) = a(ve — vw) 
Blt — tn) = B(ve + vec), B( v2 — v2) = B(vR — vec) 
v(t — v2) = (ve + vc), (02 — Un) = y(¥e — vc) 


which, since each sample has probability 1/6, yields a U:-unbiased estimator 
whenever a + 8 + y = 3. The variance of this estimator is given by 


Ha*(ve + vy) + B (ve + vec) + (ve + ve)] — ve. 





898 A. P. DEMPSTER 
For the class 2 estimator it is easily seen that 


a ee y 
AetAw AetAec Ae +A 


whence, using a + 8 + 7 = 3, the class 2 estimator is defined. Also, if the above 
general expression for the variance is minimized over choices of a, 8 and 7 it is 
seen that the minimum variance occurs for 


SO = ER Mt 
Uetiu Untvec Ue + ve 
indicating again that the metric with | vs |/A, constant produces optimum esti- 
mators. ; 

For U,-unbiasedness it is necessary that a = 8 = y = 1 and so this choice 
produces the class 1 estimator. Note that this estimator happens to be inde- 
pendent of the A-metric. Its variance is (vy + vt + vec). Since the class 1 
estimator is conditionally unbiased given the symmetry class it makes sense to 
quote the conditional variances given the symmetry class, namely vy , Upc OF 
ve given symmetry class G, , G: or G, respectively. 

The formulas for these estimators and their variances are not as simple as 
one might expect from the simplicity of the examples. To give some feeling for 


TABLE 1 
Variances for the numerical example N = 4, vy = 8.5, v2 = .5, vm = 2.5 
and v2 = —1.5: class 2 estimators forn = 2andn = 3 
and for various \-metrics. 


Metric var (by) var(dp) | var(@c) | var (dec) 


Ts ee | 18.67 | 15.00 | 25.67 
wheeme 6.22 “ | —-§.00 8.56 


hu @,As@Ac | 18.67 | | 33.00 | 41.00 
13.00 


116.00 
Ac Arc =Cc . ° ° | 36.00 
My @ Ac « 9 | 15.00 56.00 
Arm = Arc =c é . : j 5.00 16.00 





Am = Ar = Ac @ 4 16.44 } 15.15 25 .67 

Arc =C d 4.00 4.00 8.56 

Aw/5 = Ap/4 = 4 16.04 } 13.49 23.40 
Ac/6 = Argc/2=Cc 3.14 2.84 








RANDOM ALLOCATION DESIGNS. I 899 


TABLE 2 


Variances for the numerical example N = 4, vy = 8.5, ¥» = 5, 0m = 2.5 
and tm = —1.5: class | estimators for n = 2 and for any d-metric 
Conditioning var (Oy) var (f,) var (fc) var (Ogc) 
Conditional on G, 16.00 25.00 4.00 | 36.00 
Conditional on G, 36.00 4.00 25.00 16.90 
Conditional on G; 4.00 36.00 16.00 25.00 


Unconditional 18.76 21.67 15.00 | 25.67 


actual numbers produced by these methods consider the numerical example 
where vy = 5, ve = 4, vc = 6 and rec = 2, i.e., the basic 2° array where 


‘nn = 8.5, Cty = +5, ty = 2.5 


and v2 = —1.5. Variances pertaining to this example are presented in Tables 1 
and 2. 

Table 1 illustrates how variances corresponding to various A-metrics which 
might be used in practice are related to variances corresponding to the unknown 
optimum A-metric Aw/5 = Ar/4 = Ac/6 = Age/2. A comparison of Table 1 
and Table 2 shows that, unconditionally, var (0,5) > var (4) for the optimum 
A-metric, but that this inequality does not always hold for different \-metrics. 
The inequality illustrates the general fact that the minimum variance of esti- 
mators in the class of U:-unbiased estimators must be less than or equal to the 
minimum variance of estimators in the more restricted class of U,-unbiased 
estimators. 


6. Optimum properties. In this section we define a wide class of linear un- 
biased estimators of f,(V) for any random allocation scheme and show that 
certain estimators of the types defined in Section 4 are optimum in the wide 
class. The criteria of optimality are minimum symmetrized variance among 
U:- or U,-unbiased estimators and minimum symmetrized mean square error 
among all estimators. These “symmetrized”’ criteria will be defined shortly. 

Suppose the random allocation scheme permits exactly d distinct n-spaces FE, 
each with positive probability. Consider nd arbitrary real numbers to be used 
as d sets of n coefficients applicable to the n observations corresponding to each 
possible £,. The resulting linear combinations define the values of a random 
variable whose random properties are induced by the random choice of FE, . As 
the nd coefficients assume all real values they define an nd-dimensional vector 
space O of random variables. Any v* ¢ O has for its average some linear combina- 
tion of the N quantities v;, of the underlying array, i.e., 


ave {v*} = f.(V) forsome Vek, 


where averaging is over the complete random choice of designs, and so v* is a 
U--unbiased estimator of f,(V). For any given V ¢ E we consider as our general 











900 A. P. DEMPSTER 


class of linear U’,-unbiased estimators of f,(V) all those v* ¢ O satisfying 
ave {v*} = f.(V), 


and we seek the optimum in this general class. Denote this subset of O by O( V). 
One might first decide to seek that v* ¢ O(V) with minimum variance. Un- 
fortunately the class O( V) is sufficiently large to include estimators which have 
variance zero but which are of an uninteresting and asymmetrical type. Note 
that in general the variance of an estimator is a quadratic function of the v;» 
and what we are doing is finding estimators which are unbiased for any v;; but 
whose variance is minimum for a particular set of v; (i.e., the “true’”’ values). 
For example, consider a simple random allocation scheme observing n of the N 
quantities v;;,. Let us now define an unbiased estimator of, say, vy, which will 
have variance zero when the v;, are in fact equal to a set of numbers z;;, . The 
coefficients in this estimator will depend of course on the z; . Suppose for 
simplicity that all the x, are non-zero, and define z;;. = v;/2: . Define random 
variable Z to be the mean of the n observed values of v,;, . Define random variable 
Zn = N =~? a1 N = i if Vib is observed 

n—1l n— 1 
= 2 otherwise. 


Then it may be easily checked that 2,2, is U:-unbiased for v;,, and has zero 
variance when v;;, = 2; for all 7, 7 and k. Similarly we can define zero variance 
U,.-unbiased estimators for any v;;, and thence for f,(V) for any Ve EZ. 

In order to have a more interesting optimum v* we define the criterion of 
symmetrized variance of an estimator v*. Symmetry here refers to symmetry 
under the group @ of P = (R!)(C!)(L!) permutations of the R rows, C columns 
and L layers of the basic array. For any g eG, any Ve E and any n-space FE, 
contained in E we denote by g(V) the vector in E found by operating with g 
on V and we denote by g(£,) the subspace of EF found by operating with g on 
E, . If v* ¢ O is a U,-unbiased estimator of f,(V) and v* = f,(V*) where V* « E,, 
then for any g ¢ @ we can define g(v*) ¢ O, a Us-unbiased estimator of f,(g(V)), 
to be g(v*) = f:(g(V*)) when g(£,) is the subspace corresponding to the ob- 
servations. Then we define the symmetrized variance of v* to be 


p(v*) = p> var {g(v*) }. 


Thus the group @ breaks the space O into mutually exclusive symmetry classes 
of estimators such that the estimators of one class share a common symmetrized 
variance, namely the mean variance over the class. The use of the criterion 
p'(v*) seems reasonable when one’s a priori beliefs about the array of v,; are 
symmetrical under @. For, if v* were adopted as the estimator of f,(V), then it 
would be only reasonable to adopt g(v*) as the estimator of f,(g(V)) and to 
judge all such estimators together by their mean variance. For similar reasons 








RANDOM ALLOCATION DESIGNS. I 901 


we may wish to use the criterion of symmetrized mean square error of v* defined 
analogously as 


p Dave tIg(o*) — f(g(¥) PI. 


The following optimality Theorems 6.1, 6.2 and 6.3 will be proved in order 
at the end of this section. Suppose we select any \-metric such that 
Nw Na Nace 


aS 


where the (M/S), are the mean squares resulting from the analysis of variance 
of the complete array of »;  , and suppose we refer to this \-metric as the optimum 
A-metric. 

THEOREM 6.1. For any random allocation scheme and any Ve E, that v* ¢ O 
with minimum symmetrized mean square error for f.(V) is given by the \-minimum 
extension f,(V) corresponding to the optimum )-metric. The symmetrized mean 
square error of f,(V) is minimum both unconditionally and conditionally with 
conditioning on each symmetry class G of designs. 

THEoREM 6.2. For any random allocation scheme and any V « E, that v* ¢ O 
which is U;-unbiased for f.(V) with minimum symmetrized variance is given by 
the class 2 estimator },(V) corresponding to the optimum d-metric. 

Tuerorem 6.3. For any random allocation scheme and any Ve E, that v* ¢ O 
which is U,-unbiased for f,(V) with minimum symmetrized variance is given by 
the class 1 estimator f,(V) corresponding to the optimum d-metric. The symmetrized 
variance of f,(V) is minimum both unconditionally and conditionally with con- 
ditioning on each symmetry class G of designs. 

Note that in these theorems it is the same \-minimum extension or class 2 
estimator or class | estimator (i.e., the \-minimum extension or class 2 estimator 
or class 1 estimator corresponding to the same \-metric) which is optimum for 
all Ve E. Note also that in each case the optimum estimator depends on the 
“true” underlying v; through its choice of optimum )-metric. 

In the case of underlying arrays with factors at two levels each a slightly 
different corollary of each theorem can be stated. Here the subspaces FE, of E 
are one-dimensional and the principal aims are to estimate the v, = f:( Vs) where 
V,¢E,. If o* is unbiased for vs then g(v*) is unbiased for +v, for all g ¢ G, 
and so it is natural to consider only those v* ¢ O which are identical with +g(v*) 
as possible estimators. For such estimators variance and symmetrized variance 
are the same, and mean square error and symmetrized mean square error are 
the same. Thus, for example, as a corollary to Theorem 6.2 we have that, in 
the case of factors at two levels each (i.e., N = 2), among all estimators v* ¢ 0 
which are U,-unbiased for v, and symmetric in the sense that v* = +g(v*) for all 
g¢@, the class 2 estimator fs) corresponding to the optimum )-metric has 
minimum variance. Similar corollaries clearly hold for Theorems 6.1 and 6.3. 

We now prove Theorem 6.1. Suppose W, , --- , Ww is a set of basis vectors 








902 A. P. DEMPSTER 


of £ which are unit orthogonal in the sense of the formal metric and where 
W, spans Ey , W, up to We span Ex , Wes: up to Wesc_; span Ee , and so on. 
Suppose we set out to estimate f,(V) where 


N 
V= > 3W, 


tol 


and for a particular 2, suppose the estimator is 


N 


v* = f,(V*) where V* = > aW, 
t= 
and where V* ¢ £,. We should like to minimize the contribution to the sym- 
metrized mean square error from the symmetry class G of E, by choosing V* ¢ E, 
to minimize 


‘5 >| > (a; — a)o(ws) | 


” Cam 


where g(w,) denotes f,(g(W;)). Suppose (VS), is the mean square associated 
with /, in the analysis of variance of the complete array of v;; , and suppose 
(MS), is defined to be (MS), for the A such that W,;¢ 2, . Then the desired 
result will follow if we show the last expression to be equal to 


i< f 
p a (a; — B:)*(MS);, 
for clearly this expression is minimized by choosing V* to be that vector in 
E, -nearest to V in the sense of the optimum A-metric, and this amounts to 
choosing v* = f,(V) for the optimum )-metric. 

In order to prove the desired equality we need only show that 


itt 
pL 9(wig(w;) = 0 
g 


> [g(ws)}? = (MS);. 
- 

One way of regarding this problem is to suppose that the fixed array v;, is 
made into a random array by choosing at random with equal probabilities an 
element g ¢ @ and applying g to the array. Under this scheme we are looking 
for average squares and average cross-products for the set of degrees of freedom 
corresponding to W, , --- , Ww. The first equality for W; and W,;; in different 
subspaces EF, is easily seen directly; for example if W; ¢ Ey and W; ¢ Exc, then 
summation over those elements of @ which leave rows unchanged is summation 
for which g(w,) is constant and hence this summation over g(w,;)g(w,;) is zero 
by the well-known property that triple interactions sum to zero when summed 
over any of their indices. The remaining equalities are best shown indirectly. 








RANDOM ALLOCATION DESIGNS. I 903 


Suppose W, and W, both belong to Z, . The left hand sides of both of the above 
sums are symmetric quadratic expressions and so can be expressed as linear 
combinations of (MS)w,---, (MS)ec:. But the left hand sides are clearly 
unaffected by changes in any of these except (MS), so that the right hand 
sides must in each case be a constant times (MS), . By supposing the rv; to be 
N independent N(0, 1) variables and by averaging both sides over this normal 
variation we deduce that the constants are as shown. This completes the proof 
of Theorem 6.1. 

To deduce Theorem 6.2 from Theorem 6.1 we need some relations between 
symmetrized variance and symmetrized mean square error. Suppose v* ¢ O(V), 
i.e., v* ¢ O and is U;-unbiased for f,(V). Suppose v* has symmetrized variance 
p (v*). We may define the symmetrized squared mean of v* to be 


w(v*) = >> [ave {g(v*)}]? = p> U(g(¥))F 


and so the symmetrized mean square error of v* is 
p Dave {Ig(o*) — filg(V))P} = o'(v*) + w'(v*). 


Among the statistics k v* for different k suppose v** is the one with minimum 
symmetrized mean square deviation from f,(V). It is easily seen that 


v®* = {[u'(v*)]/[u'(v*) + p'(v*))}o* 
with symmetrized mean square deviation from f,(V) given by 


 [2u?(v*)p*(v*)]/[u"(v*) + p°(v*)). 


It may also be easily checked that v* has minimum symmetrized variance in 
O(V) if and only if the corresponding v** has minimum symmetrized mean 
square error among estimators in O which are unbiased except for a constant 
factor. Thus in a sense it is immaterial whether we find the optimum »v* or the 
corresponding optimum v**. Theorem 6.1 tells us that f,(V) for the optimum 
A-metric provides the minimum symmetrized mean square error estimator in 
O, and Section 4.3 tells us that, for Vs e¢ 2.4, f,( Vs) is unbiased except for a 
constant factor. Thus, for any V4 ¢ 2s, fi( Va) is the optimum v** described 
above and hence the corresponding v* is the class 2 estimator h(V5). This proves 
Theorem 6.2 for vectors V of the special type belonging to an FZ, for some A. 

To complete the proof we need only show that the minimum symmetrized 
variance estimator »* ¢O(V) for any Ve FE can be written 


v* = a vs 
z 


where the corresponding V can be written 
v= > Vz 
4 








90-4 A. P. DEMPSTER 


with V, ¢ HE, and where v3 is the minimum symmetrized variance U;-unbiased 
estimator of f,(Vs). This gives the desired result since it is known that 


K(V) = X AVa). 


As may be easily checked, a Euclidean metric may be defined for vector space O 
by setting the squared length of v* ¢ O equal to its symmetrized variance p’(v*). 
Define O, to be the subspace of O consisting of all v* ¢ O whose average is identi- 
cally zero, and define O, to be the subspace of O orthogonal to O,; according to 
the p-metric. Any v* ¢ O(V) can be written as v* = vr + vs where 


vy, € Oi = 1, 2). 
Clearly ve € O(V) and, since 
p(t) = pvr tor) =p (ui) + e(v2) 2 o (vr), 
vy has minimum symmetrized variance in O(V). If 
v= > Vz 
4 
and if v3 is any element of O( V4) then 


y* = ¥ vs € O(V). 
4 


Also, if v* = vb + v: and vs, = n+ ves where ve and vis belong to 0; (i = 
1, 2), then, 


ve = > vee 
a 


which is the desired result, completing the proof of Theorem 6.2. 
Theorem 6.3 follows immediately from the application of Theorem 6.2 to 
random allocation schemes with just one symmetry class of designs. 


7. Generalization of the theory for the basic model. In this section we suppose 
the observations v;;, to be random such that vin = vise + €ijye Where vj are 
constant and ave {€;;} = 0. We suppose the ¢;; for all 7, 7 and k to have arbi- 
trary variances and covariances. The ¢;;, are assumed independent of the random 
choice of design. The same estimators used for the basic model can be considered 
for the generalized model, but when we compute their means and variances we 
will average over the randomness of the ¢; in addition to the randomness of 
the choice of design. For example, an estimator is now defined to be U;-unbiased 
if it is unbiased under averaging over both ¢,;, and the random choice of E, con- 
ditional on each symmetry class G of designs. U;-unbiasedness is similarly 
defined omitting the conditioning provision. 

The total functional f,(V) is now redefined in terms of »; rather than v;; . 
Thus 


K(X oxi ia V ijn) = d i jkVijk - 








RANDOM ALLOCATION DESIGNS. I 905 


Clearly any estimator which was U;-unbiased for f,(V) in the basic model re- 
mains U;-unbiased for f,(V), with the new definitions, under the generalized 
model. The same statement holds for U;-unbiasedness. The optimality theorems 
of Section 6 remain valid along with their proofs, provided only that (MS), is 
replaced by ave {(MS).,} where averaging is over the randomness induced by 
the «: . Thus the optimum A-metric becomes a \-metric such that 


Na/ave { (MS). 


is constant for all A. It has now been shown that the entire theory given for the 
basic model generalizes with no gaps to the generalized model. 

The generalized model covers a standard model I analysis of variance model, 
this being the case where the ¢ have common variance and zero covariance. 
An application with a more general set of variances and covariances would be as 
follows. Suppose our notion of a random allocation design were broadened to 
allow random replication of certain cells. We may suppose in this case our 
estimators of f,(V) to be based on cell means. If the basic observations have a 
model I structure then the cell means, with differing numbers of observations 
per cell, do not. However, if our random allocation scheme required those cells 
with no replication, those cells with one replicate, etc., each to have probability 
schemes symmetrical under @, then the cell means can evidently be treated as 
observations under the generalized model, and the theory applies. 

REFERENCES 

{1} F. J. Anscomsg, ‘‘Quick analysis methods for random balance screening experiments,”’ 
Technometrics, Vol. 1 (1959), pp. 195-209. 

[2] S. Barpacki AND R. A. Fisner, ‘‘A test of the supposed precision of systematic arrange - 
ments,’’ Ann. Eugenics, Vol. 7 (1936), pp. 189-193. 

[3] T. A. Bupneg, “The application of random balance designs,’’ Technometrics, Vol. 1 
(1959), pp. 139-155. 

[4] A. P. Dempster, ‘Random allocation designs, Il: approximate theory for simple ran- 
dom allocation,”’ submitted for publication in the Ann. Math. Stat. 

[5] R. A. Fisner, ‘Uncertain inference,’’ Proc. Amer. Acad. Arts and Sciences, Vol. 71, 
No. 4 (1936), pp. 245-258, (reprinted as paper no. 27 in R. A. Fisher, Contributions 
to Mathematical Statistics, John Wiley and Sons, New York, 1950). 

(6) F. E. Sarrertruwairte, ‘Random balance experimentation,’’ Technometrics, Vol. 1 
(1959), pp. 111-137. 

[7] P. R. Hatmos, Finite Dimensional Vector Spaces, D. Van Nostrand, Princeton, 1958. 

{8} ‘““‘Srupent,” “Comparison between balanced and random arrangements of field plots,”’ 
Biometrika, Vol. 29 (1937), pp. 363-379. 

{9} M. B. Witk anp Oscar Kemptuorne, “Some aspects of the analysis of factorial ex- 
periments in a completely randomized design,”’ Ann. Math. Stat., Vol. 27 (1956), 
pp. 950-985. 

{10} W. J. Youpen, Oscar Kempruorne, J. W. Tuxeyr, G. E. P. Box anv J. 8. Hunter, 


“Discussion of the papers of Messrs. Satterthwaite and Budne,”’ Technometrics, 
Vol. 1 (1959), pp. 157-193. 








A MIXED MODEL FOR THE COMPLETE THREE-WAY LAYOUT 
WITH TWO RANDOM-EFFECTS FACTORS! 


By J. P. IMnor 
University of California, Berkeley 


1. Summary. In the present paper the Mixed Model developed by Scheffé [10] 
for the complete two-way layout is extended to the complete three-way layout 
with two random-effects factors. The model involves three basic covariance 
matrices of unknown parameters in addition to the error variance and fixed 
effects. Assuming normality, tests of the usual statistical hypotheses, except that 
of no fixed main effects, are derived from the analysis of variance table. Those of 
no interaction between the fixed-effects and a random-effects factor are applica- 
ble only under a simplifying assumption. A reduced form of the model is derived 
which involves sets of independent identically distributed random vectors. 
These are used to obtain unbiased estimators of the basic covariance matrices 
and to construct a T? test of the hypothesis of no fixed main effects. This test in- 
volves nonoptimum estimators of the effects, but this is shown to result in general 
only in a small loss of power. Individual and simultaneous confidence intervals 
for the fixed main effects are obtained in terms of these nonoptimum estimators 


2. Introduction. In analysis of variance problems involving both fixed-effects 
(or Model I) and random-effects (or Model IT) factors, various Mixed Models 
have been proposed in recent years. Of those, certain arise as particular cases oi 
very general models for which knowledge about distribution theory is at present 
only fragmentary [3], [12], [13], while another approach consists in setting up a 
“normal theory’? model with sufficient assumptions so that, in particular, exact 
tests of the standard hypotheses can be derived. This has been done by Scheffé 
[10] in the case of the complete two-way layout. His model and the method of 
analysis derived from it can easily be extended to complete layouts of order 
higher than two, as long as there is only one random-effects factor. On the other 
hand, the analysis becomes considerably more intricate when the number of 
such factors is increased. We develop it for the case of three factors, two of which 
are Model II. 

Matrices of the type 


(2.1) S - ((8iv)), iy = b + bi (a 0s b), 


will frequently occur. Here ((s,)) denotes the matrix having elements s;,, 
i = 1, --- , n refers to the row, 7’ = 1, --- , n to the column and 3,, is the 


Received January 19, 1959; revised March 16, 1960. 

' This paper was prepared with the partial support of the Office of Naval Research (Nonr- 
222-43). This paper in whole or in part may be reproduced for any purpose of the United 
States Government. The paper is based on the author’s doctoral dissertation. 

? Now at the University of Geneva. 


906 








LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 907 


Kronecker 6. The canonical reduction of S can be performed by using an orthog- 
onal matrix having elements n™ in the first row. In Section 8 it will be ad- 
vantageous to use for this reduction a matrix with elements having as few 
different numerical values as possible. 

Lemma 1. The symmetric matriz P of size n, with elements piv = n° if either 
i= lord = land pw = be — (n— 1) "(nh 4+ 1) fi > Landi’ > 1,is 
orthogonal. In particular, one has 


(2.2) De PP = beer — nn; Dore pw = 0, i = 2,--+ Nn. 


The following further conventions are made regarding the notation used: 
Vectors are column vectors. Matrices will always be square. The transpose of A 
is written A’, and trA is the trace of A. We write A = ((A,,)) for a matrix A 
partitioned into submatrices A, . Also, ((diag. A; , --- , Ar)) is the matrix 
A = ((Aw)), i, = 1, --- , 7, where Ay = 6;A,. If X is a random vector, 
EX is its expectation and ZY, its covariance matrix. The vector X is N(6, 2x) 
means X is normal with expectation 8 and covariance matrix Ly . The symbol 
U is reserved exclusively for the identity matrix. Finally, a dot substituted for a 


subscript indicates that the average has been taken over all permissible values of 
the subscript. 


3. The model. Basic formulas. Consider a complete three-way layout design 
involving the factors A, B and C. A is a fixed-effects factor appearing in the 
experiment at levels i = 1, --- , 1. B and C are random-effects factors. The 
levels of B and C at which the experiment is carried out are selected at random 
and independently as regards B and C from two conceptual infinite populations 
which we represent as two abstract spaces V and W. The selected levels are 
labeled »; (7 = 1,---, J) and wm (k = 1,---, K) respectively. For each of the 
IJK combinations of levels (7, 7, k), L replications are performed. The observed 
responses are labeled y;,;. When no further indication is given, subscripts 
i, j, k, Lrange over the values 1 to 7, 1 to J, 1 to K and 1 to L respectively. 
It is always assumed that J, J, K > 1 and L 2 1. The case L = 1 is formally 
included, but certain of the results are then obviously meaningless. In later 
sections, additional restrictions are imposed on J, J and K. 

An example may be obtained by extending the illustration given in [10] as 
follows: Think of an experiment in which K batches of material are used by each 
of J workers on each of / different makes of machines, the output of the jth 
worker with the kth batch on the ith machine being determined separately over 
L experimental periods, to yield the observed outputs y,:. Here the workers 
and batches selected for the experiment are considered to be chosen randomly 
from the idealized infinite populations of workers and batches that might have 
been used in the experiment, and the replications are supposed to be carried out 


in such a way that they do not interact with the other factors, in particular not 
with the factor “worker’’. 








908 J. P. IMHOF 


We assume that the response (in our example, the output) y;,: has the struc- 
ture 


(3.1) Yixet = m(t, Vj, We) + Cijkl y 


where the e;; are “errors” and the m(i, v; , w,) are “true cell means.’”’ The ran- 
dom selection of v; in V and, independently, of w, in W justifies, in our view, the 
fundamental assumption that m(i, v; , w,) is distributed for all j and & like a 
basic random variable m(i, v, w) and that m(i, v; , w,.) is independent of m(7’, 
vy, We) when both 7 # j’ and k # k’. There is nothing however to justify 
assuming independence when j = j’ or k = k’. One can think of the two infinite 
populations from which the levels of factors B and C are drawn as corresponding 
to two probability distributions ®y and ®w on V and W, so that (V, @y) and 
(W, @w) are probability spaces and the distribution of the random variable 
m(i, v, w) is that of the real-valued function m(i, v, w) on the product space 
(V X W, @v X Pw). One can then define the random variables 


m(i,v,-) = | mii, v, w) dPw(w), 


(3.2) 
m(i,-,w) = [ mii, v, w) dPy(v). 


The first of these would have, in our example, the interpretation of true mean of 
a randomly selected worker labeled v when he uses machine i, averaged over the 
population of batches. A similar interpretation can be given for m(i, -, w). The 
second moment structure of the model depends essentially on three basic co- 
variance matrices having elements 


ow = Cov {m(i, v, w), m(1’, v, w)}, 
(3.3) viv = Cov {m(i, v, -), m(1’, v, - )}, 

tiv = Cov {m(i, -, w), m(i’, -, w)}, 
and on the linear combinations 


(3.4) 


Assume o;4; < ~, all ¢. The relation p;; 2 0 obtained in Section 6 implies then 
finiteness of the »,,’s and r;,’s also. 

The assumptions made so far are, we believe, realistic: They express what is 
implied by the random selection of the levels of B and, independently, of C 
from the two conceptual infinite populations V and W described above. Further 
assumptions which are needed for computing E(MS)’s (expected mean squares ) 
and finding unbiased estimators are less satisfactory: The errors e; 4; are assumed 
to be pairwise uncorrelated and to have zero means and a common variance o. 
Furthermore, the e;;; are assumed to be uncorrelated with the m(7’, vj , wi) 
for all i, 7’, 7,7’, k, k’ and 1. In the particular case J] = 1, the model coincides with 








LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 909 


the one described in Section 7.4 of [11] for the complete two-way layout under 
Model IT. 

The exact distribution of test criteria, and exact confidence intervals for 
various parameters are derived in later sections under an additional normality 
assumption, namely that the random variables e;;, m(i, v, w), m(i, v, -) and 
m(i, -, w) have joint normal distributions. For instance, V and W could be two 
real lines, ®y and @» be two independent normal distributions on them and the 
functions m(i, v, w) be for each 7 linear functions of v and w. On the other hand, 
the case m(i, », w) = vw shows that joint normality of the m(i, », -) and 
m(i, -, w) does not imply that of the m(i, v, w). It can also be verified that 
joint normality of the m(i, v, w) does not imply that of the m(i, v, -) 
and m(i, -, w). 

We define main effects and interactions in a natural and conventional way by 
letting 


w= m(-,-,-), aj = m(i,-,-) — m(-, +, +), 


a®(v) = m(-,v,-) — m(-, +, +), 


(3.5) at"(v) = m(i, v, -) — m(i, -, -) — m(-,v,-) + m(-, +, -), 
a®¢(v, w) = m(-,v,w) — m(-,0,-) — m(-,-,w) + m(-,-,-), 
at"“(v, w) = m(i, v, w) — m(i,v, +) — «+> + mi, -, -) 
+ coe = m(-, \. -). 


In those formulas a dot substituted for v or w means that expected value has 
been taken with respect to Py or Pw , as in (3.2), while a dot in place of i means 
the arithmetic average has been taken over the valuesi = 1, --- , J. Thus ug is 
the general mean, a{ is the main effect of factor A at level i, a®(v) is the main 
effect of factor B at level v, etc. Except for u and the af, all main effects and 
interactions are obviously random and one finds at once from their definition 


(3.6) Ea®(v) = Ea{"(v) = Ea®¢(v,w) = Eat®°(v,w) = 0. 


We have omitted writing a°(w) and af°(w) because their definitions are similar 
to those for a8(v) and a{”(v). As a general rule, when considering variance com- 
ponents, sums of squares, estimators, etc., “C” can be treated like “B” and 
“AC” like “AB” by substituting k for j, riv for viv , m(i, -, w) for m(i, v, -), 
ete. 

Next we define variance components by using the analogy with a “finite 
model”’ described in [10]. This leads to the natural definitions 
Var a*(v), o = (I— 1)" > Varai"(v), 


2 
os 


3.7 
3-4) osc = Var a*¢(v, w), ouse = (I — 1)7* > Var at” (v, w). 








910 J. P. IMHOF 


As usual, let 
(3.8) on = (I — 1)" 2D (aZ)’. 


In terms of the basic parameters (3.3) and of their linear combinations (3.4) the 
variance components become 
2 


Op = Vv.., eis = (I 7 17D (ve aa v..), 
(3.9) 


-, Cane = (I — 1)" 2) (pis — p..). 


These relations are best derived from (3.15) below. 
For the levels v; (j = 1, --- , J) and wy%,(k = 1, --- , K) of v and w selected in 
the experiment, equations (3.5) identically give 


m(i, vj, We) = w+ at + a8(v;) + ac(we) + at? (v;) 

+ +++ + af?"(v;, we). 
This notation being cumbersome, we write simply 
(3.10) min = w+ at + af + ay + at? + ate + ae + ain’, 
and then (3.1) becomes 
(3.11) Yijet = Mizz + Cijer. 
For all j and k, the a? are identically distributed like a*(v), --- , the ati° are 
identically distributed like af"°(v, w), and (3.5) implies that 
(3.12) at = a4? = atf = ath’ = 0, all j, k. 
The main effects and interactions entering (3.10) are independent, except for 
the three pairs in (3.13) for which one easily finds 


(3.13) Cov (aj , ai;’) = %. — »., Cov (a; , au) = ri. — +.., 
Cov (ai , ath’) = pi. — p... 


Further covariances will be needed. Let 


0 0 
Cin = Oy — OF — Oy +o.., Vie = Vig — Hm Ve + P.., 


0 0 
Tv = Te — TT — te tr. , pir = Pir — Pi — pw top... 


One finds 


(3.14) 


AB AB 0 ac Ac 0 
Cov(aij , ai'5") = bj; vie , Cov (aix » Aik!) = Seer ris’, 


‘ BC BC BC BC 0 
Cov (aj. 9 a;’%’ ) = 85 ;nep.. , Cov(ati : at’ s's') = 8 5 5B eee piv ’ 
(3.15) Cov(a? , a?4) = Cov(ag , af'4") = Cov(ai? , at4) 
= Cov(at? , aff) = Cov(ait , aif) = 0, 


all ¢, 7’, j, 7’, k, k’. 





LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 


The derivation of these relations is routine, based mainly on equalities like 
(3.16) Cov(min , mvjx) = Covim(i, -, w), m(i’,-,w)} = rw, forj #7’ 
This follows from 

Elm(i, v; , we)m(t’, vy , we)] = EL E[m(i, vj, we)m(i’, vy , we) | wr)} 


E[m(i, "» w,)m(i’, *, We)) 
when j + j’. 


4. The analysis of variance table. Immediate consequences. In order to obtain 
point estimators of the variance components and test criteria for the usual sta- 
tistical hypotheses, we proceed in the usual fashion, which consists in writing the 
SS’s (sums of squares) that one considers in the corresponding pure Model I 
complete three-way layout and computing the E(MS)’s, the numbers of df. 
(degrees of freedom) being the ranks of the quadratic forms defining the SS’s. 
Using (3.10), (3.11), (3.12) one has, 


‘ais SS, = JKLD (yi... — y....)’ 
: 7 

= JKLY (at + af? + af? + af® + &.. — ¢....)° 
and so forth, the well-known expressions for the SS’s yielding 


SS, = IKLY, (aj — a? + as’ — a?’ + 2.;.. — €....)’, 
SS, = KLY x (at? — af*® + ate’ — af2* + Cijee — Chee — Cnge 
+ ¢....)%, 
(4.2) SSec Ly x (ase — ape — ae + are + b.j. — Cnjn — Com 
+ ¢....)’, 
SSasc = L> 2» 2 (ath — aff® — ath? + af?” + esp. — e4y.. 
ah hose ane 
SS, = 2 y 2 2X (ein — Cijp — °°" + 643. + oe — ye. 


ee ot ee 


Consider now the computation of E(MS,). From (3.7), (3.8), (3.15) and the 
assumptions below (3.4) it follows that 


ESS, = JKLY [((at)? + E(at?)? + E(at®)? + E(at®°)*? + Ele... —e....)") 


= (I — 1)JKL fo, + Joke + Keke + (JK) “ose + (JKL) 03). 





912 J. P. IMHOF 


Proceeding in the same fashion for other SS’s leads to the following analysis 
of variance table: 


SS df. E(MS) 


A I-1 JKLo’, + KlLo'gg + Sloe + Logac +o 
B J=-1 IKLe’, + ILe'ng + 0% 
AB Ud = 1) — 1) KLo',5 + Louse + 0% 
BC (J — 1)(K — 1) Lee + 0% 
ABC (I-—1)V -1)(K -1) Lene + 0: 
error IJK(L — 1) : 


Gs 


This coincides with the table one would obtain by applying the rules of Bennett 
and Franklin [1] for writing down E(MS)’s. The table (4.3) shows that the 
variance components (3.7) admit the unbiased estimators 


és = (IKL)“"(MSs — MSszc), bc = (IL) (MSpc — MS.), 
@un = (KL)"(MSas — MSasc), anc = L'(MSasc — MS,), 


while as usual o? has the unbiased estimator 62 = MS, . Here and in further sec- 
tions, the caret is used to denote estimators which are unbiased, but are not in 
general maximum likelihood estimators. 

Natural hypotheses to consider are 


(4.4) 


Hx: P = 0, Hs: on = 0, Hac: ove = 0, Has: ons = 0, Hasc: cise = 0. 


The hypothesis H, will be considered in Section 8. For the other hypotheses 
(i.e., those relative to random effects) the table of E(MS)’s suggests using the 
criteria MS3/MSsc ’ MSsc/MS, ’ MSas/MSasc ’ MS,asc/MS, respectively. 
In order to get some insight into the meaning of the different hypotheses con- 
sidered, notice that Hz <= v.. = 0 or, using (3.7), Hz <= m(-, v, -) has a de- 
generate distribution, m(-, v, -) = c, a.s. (i.e., with probability one). This 
corresponds exactly to the intuitive idea of no main effect due to factor B. 
Similarly, one can write 


Hac © p.. = 0 m(-,v, w) = m(-,v,-) + m(-, -,w) +, as., 
(45) Has viv = 0%, alli, i’ <> m(i,v, -) = m(-,v,-) te, as., 
Hase © piv = onc, all i, i’ > m(i, v, w) — mii, v, -) — mii, -, w) 
= m(-,v,w) — m(-,v,-) — m(-, -,w) +c, as. 


Consider first testing Hs and Hac ° Letting vn = aye + C. jk: gives 
SS, = IKL> (a? — a? + v;. — v..)’, 


SSspec ms Ly » (vm = i oa v..)?. 


According to (3.15) the variables {a}, vj} are mutually independent and 
Var va = osc + (IL)~'c?. Using the familiar results of Model I theory one 





LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 913 


finds that when the normality assumptions described above (3.5) are made, 


MS, _ IKLo’, + ILonc + 0: p 
MSz- eye Or I mers J-—1,(J—1)(K-1) 


where F,,,, is an F-variable with m and n d.f. Thus we reject H, at the a level of 
significance if MS,/MSzc is larger than the upper a-point of this distribution. 
In the same fashion, 


MSsc -* ILe'se + oe F 

Ms. _ nn (J—1)(K—1) ,IJE(1L—1) + 
In this case one can test the more realistic assumption H$2: o4-/o23 < @ by 
rejecting it if (JL@ + 1)"'MSzc/MS, exceeds the upper a-point of the F-distri- 
bution with the above numbers of d.f. 


5. The hypotheses H,, and H,,-. We investigate in this section the distribu- 
tions of the ratios MS4s/MSasc and MS,sc/MS, suggested by (4.3) for testing 
H 4, and H4sc . The normality assumptions are made throughout. Let 
(5.1) bisa = at} oa ath’ + Cijk- 5 Cn = bine —_ bie -_ D. ix al b... ° 
Then 
(5.2) SSas = KLY das. 9 Sasc = Ly x x (Cin _ C43.) . 

Using (3.15) gives 
(5.3) Cov (bis , by se) = byy[0ee + Sex’ ( pre + beL*o?)). 
Then, noticing e.g., that of. = o°» = 0 one finds 
(5.4) Cov(cie, cove) = (8:7 — J") rte + delete + (8 — I)L oi}, 
from which it follows that Cov(cij — ¢i;. , cv.) = 0. Hence SS,4, and SSasc 
are statistically independent and we only need investigate their distributions 
separately. For this purpose a well-known result [2] relative to quadratic forms 
in normal variables is used: If the vector X is N(0, 2), the quadratic form 
X’QX has the distribution of >”, \-x7,), , where the x’ variables each with one 
d.f., are independent and the coefficients }, are the nonzero latent roots of the 
matrix 2Q. Consider first SS,, . According to (5.1) it can be written SS,, = 
KL(X'X), where X is the vector 
X = (tu, °°, Mis, eee » Za, eee 9 Fes 5 eee » tn» eee » X17)’, Bes ae Ci. « 
The elements of its covariance matrix = are found from (5.4) to be given by 
Cov (xij, 245.) = (85% — J“) + K Ueto + (Oe — I) Lot}. 


Thus one can write 2 = ((Z.-)), where each submatrix 2, of size J has the 
structure (2.1) and has row sums equal to zero. Let P* = ((diag. P, --- , P)), 
the J diagonal blocks P of size J being as in Lemma 1. The nonzero latent roots 
of 2, which equal those of P*2P*, are then found to be the nonzero latent roots 





914 J. P. IMHOF 


of the matrix of size ](J — 1), M* = ((My)), where the My, are diagonal 
matrices having their J — 1 diagonal elements equal to mi = viv + K piv + 
(IKL)“"(1éi — 1)02. Letting M = ((myw)), one verifies that |M* — \U| = 
| M — \U |’ (where the identity matrix U has in each case the proper size), 
and so the nonzero latent roots of M* are, each with order of multiplicity J — 1, 
those of M. Now, substituting in | M — XU | the sum of the columns (all com- 
ponents of which equal to —\) for column J, and then row 7 minus row 7 for 
row i (¢ = 1, --- , J — 1) and developing the resulting determinant in terms of 
elements of the last row one finally obtains that 


I-1 


(5.5) SSas = a eXtr).9-1 ’ 
where the J — 1 variables x7,).,-; have independent x’ distributions with J — 1 
d.f. each and «,, --+ , ¢;, are the latent roots of the matrix of size J] — 1, 


(5.6) C=A+B+o0U, 

where 

A = KL((v% — viv)), B= L( (pre — pir)), 

(5.7) 
r,r=l,---,2—1. 


As a general rule we shall substitute subscripts r, r’ for the subscripts 1, 7’ when- 
ever the range of values is 1 to J — 1 instead of 1 to J. One has >> e = fr 
C = (I — 1)(KLois + Lo’ec + 02). More can, in fact, be said about the ¢’s: 
Consider the matrix of size 7, H = ((KLviy + Lpiy)). It is a covariance matrix 
(of the vector with components (JKL)'(at® + at®®)), hence its latent roots 
1, °** ,#, are = O. Thus the latent roots uf = uw; + oof H* = H + ofU are 


2=o.. But performing the same column and row operations as above (5.5) 
one finds that 


|H* —p'U| = (e — w*)|C — pV |, 
where C is given by (5.6). In other words J — 1 of the latent roots of H* coincide 
with those of C, while the last one equals o2. Thus 


Lemma 3. ¢ 2 03,r =1,---,7—1lande = KLeis + Levee + 
Next consider SS,sc . Define the vector 


, 
(5.8) x* = (Ci, °° * Cum, Cin, *** > Crom, *** > Cise, Cm, *** » Crom)’. 


Then SSasc = KL X*Q*X* where Q* = ((diag. Q,---,Q)) and the JJ 
diagonal blocks Q of size K are given by Q = ((du% — 1)). The method used 
above to reduce the computation of the latent roots of = can be applied now to 


=*Q*, where the elements of the covariance matrix =* of X* are given by (5.4). 
One finds that 


I-1 
J , 2 
(5.9) SSasc = 7 €r X(r),(J-1)(K-1) 5 
r=) 





LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 915 


where the J — 1 variables x7,), (RD have independent x’ distributions with 
(J — 1)(K — 1) df. each and mes trey bu , are the latent roots of the matrix of 
size I — 1, 


(5.10) D=B+ aU, 


where B is given by (5.7). The analogue of Lemma 3 is here 

Lema 4. « = a oe ee land ¢. = Losec + 0°. 

Consider the test of the hypotheses H,» , based on the criterion MS42/MSasc . 
Under Hus, (3.14) and (4.5) show that the matrix A of (5.7) reduces to a zero 
matrix and so, comparing (5.6) and (5.10), one has ¢, = e- . Hence MS4s/MSasc 


has the distribution of 
) > ‘ Xinat 
(5.11) (K — 1) 


L« €, Xir). (J—1) (K— _ 


where all x’ variables are independent. This distribution is simple only if a = 

- = €;) = €., and it is easily verified that this is the case if, and only if, the 
matrix ((p.)) has the structure (2.1). Then MS,42/MS,ec has the F-distribu- 
tion with (J — 1) (J — 1) and (J — 1)(J — 1)(K — 1) df. Letting b,(v, w) = 

at”°(v, w) + a®¢(v, w), one has piv = Cov [b,(v, w), by (v, w)]. This does not 
help much in giving the above restriction on the p,, a simple physical signifi- 
cance. However, the stronger assumption that all three covariance matrices (3.3) 
have the structure (2.1) carries more intuitive meaning. 

The situation is simpler when testing H4»c . Under this hypothesis, B of (5.7) 
is a zero matrix so that the criterion MS,»c/MS, has under the hypothesis the 
F-distribution with (J — 1)(J — 1)(K — 1) and JJK(L — 1) df. Concerning 
the power of the test, one might remark as follows: Specifying a value for o.sc 
and o2 does not specify a unique alternative but a subclass, say C(o1.nc , 02), of 
alternatives. Among those, one might intuitively feel that the hardest ones to 
distinguish from the hypothesis Hasc: pi = ope for all i, i’ are the ones for 
which ( (ose )) has the structure (2.1). When such is the case, « mites 
¢-1 = Loc + 02 , 80 that (Loic + 02) 'M Sanc/MS, has the F-distribution and 
the power is immediately computable. Another reason why the power against those 
particular alternatives can be expected to be a lower bound (or at least nearly so) 
of power values for all alternatives in C(o.9c , 02) is that the cumulants of (5. 9), 
obtainable from formula (2.3) of [2], are minimum when « = --- = 4. 
The cumulants being positive, the same is true for the moments. Now SS,ac is 
= 0, its mean is fixed for fixed o4,c and o? , and one therefore expects its dis- 
tribution to have least mass in the tail when the moments are smallest. The 
same should then also be true for the distribution of MS,s-/MS,, which 
indeed (at least for small enough values of the level of significance) means that 
the power is lowest when ((p,;)) has the structure (2.1). 


6. Reduced form of the model. Sufficient statistics. By means of an orthog- 
onal transformation, we obtain in this section certain sets of uncorrelated 





916 J. P. IMHOF 


identically distributed vectors. Assuming normality, they yield sufficient sta- 
tistics for the parameters of the model. Introduce the vector of observations, 
ordered as follows 


(6.1) Y = (yun, +++, Yue, Yun, ***, Yum, -**, 
Yue,» Yim, *** , Ye, *** » Yrs)’. 
When normality is assumed, its probability density is 
(6.2) p(Y) = const. | Zy | exp {—4(¥ — EY)’z7'(Y¥ — EY)}, 
where, putting 8; = u + a, 
EY = (fi , eee » Bi, Be, eee , Be, eee a eee ier 

each 8; being repeated JKL times. The elements of Zy are given by 
Gwe + bedwo, iff =7',k 

ifj = j,k 


(6.3) Cov (yisr, Your) = ee 
ifj ~j’,k 


Write Zy = A + oi and partition A into A = ((Ayw)). 
The submatrices A,, of size J/KL are then given by 


where each of the submatrices Fi , Gi , Hiv of size L has all its elements 
equal to oi , viv , Ta Tespectively and where we have written only the upper 
left 2KL X 2KL corner of Ay , from which the structure of the whole matrix is 
clear. 

We shall reduce the exponent in (6.2) to a simple form by applying succes- 
sively three symmetric orthogonal transformations based on the matrix defined 
in Lemma 1. More precisely, the matrices P, , P: , and P below are defined like 
the matrix P of Lemma 1, with n taking on the values L, K and J respectively. 

First, let 


Z=PYY, Pt = ((diag. Pi, ---,P:)), 
where Pf of size 7 KL consists of 1K diagonal blocks P; of size L. Then 


Zin = L yx. ’ Ez; 5x1 = 5,18, ’ =z = PTA P? + of. 








LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 917 


The matrix PTA PT has the same structure as A, except that e.g., Fi has to be 
replaced by P,F iP; , the only nonzero element of which is the 1,1-element, 
which equals Lo; . Therefore, 


(Y — EY)’=7'(¥ — EY) = (Z® — EZ™)'xzl(Z™ — EZ) 


+P EEUE sa, 


where the vector Z of dimension ]JK has components 2}, = zija which we 
order as in (5.8). Writing 220) = L(()) + o2U (U is now of size IJK and 
each %,, of size JK), one gets for %, a matrix like (6.4), but with entries 
ow , viv and ry instead of Fi , Gi and Hy respectively. Next, let 


Z® = P3Z", Pz = ((diag. P2,---,Ps)), Ps = ((pii*)), 
where P} of size JK has IJ diagonal blocks P; of size K. Then 


(2) 


233, = K'2§? = (KL)'y;;.., 2Rhe= BY pun. fork > 1, 


(6.5) 


(6.6) 

Ez$i = (KL)'s,;, Ez =0 fork > 1. 
Furthermore, writing 20) = L((Wi)) + o2U, one finds that if ¥,, is in turn 
partitioned into J* submatrices of size K, then it has the structure (2.1) with 
n = J and a and b replaced respectively by the submatrices of size K 


Cis’ + (K — 1) vice 0 0 Ti’ 0 cee 0 
0 Oi’ — Vii’ 0 0 Ti" coe 0 

. . “*e . and . . “ee. . 
L 0 0 Oy“ Far’ | 0 0 77s Tis’ 


Finally, let 
X= P3Z®, P= ((diag.Ps,---,Ps)), Ps = ((P%)), 
Py = ((diag. pyy , +++, pi) with ((pyy)) = P, 


where P} consists of J diagonal blocks P, of size JK, P3;: is of size K and P of 
size J is as in Lemma 1. Then 
(2) 


tau = J'28) = (JKL)*y:..., tin = De svete for (j,k) ¥ (1, 1), 


(6.7) 
Ezm = (JKL)'8;, Exzin = 90 for (j,k) ¥ (1, 1). 
Writing 
(6.8) tx = L((Aw)), 
one finds that Ay, = PW iP, can be written 
ie O ee. a (j = 1) 
(69) pdt 2 eZ 127 48 aM 


0 0 -: 5] (j=J) 














918 J. P. IMHOF 


where 

ry? ove .] u, O toe” 
ie a eee. - ae 

“a aarre el : O +++ Wee 
are submatrices of size K, the elements of which are defined by the relations 


(6.11) ote = oi + byl ot 
and 


rie = of + (K — 1)me + (J — I) ree, 
(6.12) ue = of + (K — 1)ve — tH, 
vie = ov — ve + (J — Ite, Wie = ote — Vie — TH. 
Define the matrices of size /, 
(6.13) Ro= ((ra-)), Uo=((uw)), Vo=((viwv)), Wo = ((wi)). 
Then 
(6.14) Ro + Wo = Uo + Vo. 
Define also a set of JK vectors, all of dimension J, as follows: 
R = L“(2m,°**, Za, °°*, 2m)’, 
U; = L*(ap, +++, tin, *** 5 2m)’, J>1, 
(6.15) Vi = L(y, +++, Zan, *** , tne)’, k>1, 
W ye = Laie, -** 5 isn °° » Baya)’, j,&> 1. 


The equations (6.8), (6.9) and (6.10) show that these JK vectors are uncorre- 
lated. Their covariance matrices are respectively Ro , Up , Vo and Wo and, accord- 
ing to (6.7), 


(6.16) EU, = EV; = EW = 0, j,k > 1. 


The quadratic form in Z” in the right-hand member of (6.5) can then be written, 
as one easily verifies, 


(Z® — EZ™)'33zty(Z™ — EZ™) = (X — EX)'Xz' (X — EX) 
(6.17) = (R — ER)'Re'(R — ER) + © UjUs'U; 
j>1 


+ & Vile Vi + DD WaWe Wa, 
> 


j>1 k>1 


the last three terms of which also equal 


(6.18) tr (Us >> UU; + VS ViVi + We'd DY WaW hl. 
j>1 k>1 J>1 k&>1 











LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 919 


When normality is assumed it follows from this, (6.5) and the Neyman fac- 
torization theorem, that a sufficient statistic for the parameters of the model is 


(6.19) T = {s’, 8, 00, Vo, Wal, 


where 
§=(VWK(L-pyeyo ye YE Sada, 8 = (JK) *R, 
‘ i k i>1 


(6.20) Oo=(J-1)°DUU;, W=(K-1)°D VN, 
j>1 k>1 


Wo =((J —1)(K-D'D YD WaWn. 
jJ>1 &1 
Using the intermediary relations (6.6), (6.7) and applying (2.2), one finds after 
some straightforward algebra that 7 can be expressed in terms of the observa- 
tions as follows 


8 = MS, : B; = Y.. 


"9 


iy = (J — 1) °K x (Yij-- — You) (Yop. — Yorrs)s 


(6.21) = dy 


J(K ~~ p> (Yi-e- ~F Yi--) (Yor-e - Y---), 


Wj" (J oes 1)(K == 1)"*> 2 (Yise- —~ Yij-- ~ Yi-r- + Yi---) 
3 


ll 


(Yar je- —~ Yirj. —~ Yor-e- + Yu), 
where we write 8 = (6, , --- ,B,)’, Oo = ((@iw)), VY. = ((iw)), We = ((@w)). 


7. Unbiased point estimators of the parameters. The results of the previous 
section enable us to find unbiased point estimators of the basic parameters of the 
model, namely yp, a; , 0%, ow , Viv, Tir, St = 1, +++ , J. We shall also prove 
that if J, K > J and normality is assumed, those estimators are minimum 
variance unbiased. 


Unbiased estimators for u, a; , #2 are at once found to be 
(7.1) p=y.., & =Y..—-y., = 8 = MS,. 


Also, as noticed above (6.16), Zv, = Uo, j > 1. This, together with (6.16), 
(6.20) and the similar relations in V and W shows that 


(7.2) E0.=U., EVe= Ve, EWeu We. 


Solving the last three equations of (6.12) for the unknown parameters oy , 
vi, Te and substituting in the resulting equalities their estimators s, ay, , 
iy and ®,,» for 02 , us , vs and wy» , one obtains the unbiased estimators 


aa 6 = (JK)"((JK — J — K) tw + Jd + Kd) — 81" 8, 
3) 


vi = K" (tw — Dw), Ty = J (uw — Dy). 





920 J. P. IMHOF 


Performing some algebra one verifies also that the estimators (4.4) of the com- 
ponents of variance coincide with the estimators one would obtain by sub- 
stituting 6 , ¥ and 7, for oi , x and r, in the relations (3.9). This remark 
is needed to conclude that the estimators (4.4) also possess the optimum 
property to be considered now. 

Assume that normality holds and J, K > J. Then, the estimators (4.4), 
(7.1) and (7.3) are minimum variance unbiased. To see this, the multivariate 
extension of a completeness lemma of Gautschi [4] is needed. 

Lemma 5. Let © be a parameter vector and Y be a random vector in Euclidian 
space E,,, similarly let @, and Y, be vectors in E,, . Assume that Y and Y, have 
probability densities (with respect to Lebesgue measure) of the form 


p(Y,0) = g(@)h(Y) exp [0’Y}, 
r(¥:,0:,0) = f(@,, 0) exp [YiR(@)Y; + Oi¥d, 


where R(®) is a matrix of size n, . Let the domain D of © contain a nondegenerate 
interval in E,, and the domain of ©, be E,,, . Then, the family of product measures on 
Ensn, generated by the family of probability densities 


3= ip(Y, O)—a( Yi , 9: , 8):(0, ®1) eD xX E,,} 


is strongly complete, in the sense of Lehmann and Scheffé (7). 

The proof of this can be made along exactly the same lines as in the univariate 
case [4]. Consider now the probability density of the statistic T defined by (6.19). 
The vectors (6.15) being independent, s’, 8, 0» , Vo and W, are also independent. 
Now IJK(L — 1)s° is otxiseu—» , 8 is N(8, J-'K ‘Ro) and when J, K > J, 
0. , Vo, Wo have respectively the W([J — 1] "Us, J — 1), WK — 1]"Ve ,K—1) 
and W({J — 1)(K — 1)]"Wo, (J — 1)(K — 1)) distributions, where W(2, n) 
denotes the Wishart distribution of the matrix }-?Y,Y; for n independent identi- 
cally distributed vectors Y,, --- , Y,, each N(0, 2). Let —20 be a vector in 
which the components of the matrices (J — 1)Us', (K — 1)Vo', 
(J — 1)(K — 1)Wo' and o;” are strung out, Y be a vector in which the com- 
ponents of the matrices 0,, Vo, Wo and s* are correspondingly strung out, 
R(®) = —4JKR;o', 0; = JKB’R;' and Y, = 8. By writing it out fully, one can 
then verify that the density pr of T becomes 


(7.4) Pr = p(y, 0)~r(Vi,&, 9), 


where the two factors are of the type considered in Lemma 5. The family of 
probability measures generated by the densities (7.4) is, therefore, strongly 
complete and the unbiased estimators (4.4), (7.1) and (7.3), which are func- 
tions of 7’, are minimum variance unbiased ({6], Theorem 5.1). 

When normality is assumed, the variances of the estimators ¢; and 4, can be 
estimated unbiasedly. One verifies at once that Var 62 has the unbiased estimator 
26$/(». + 2), where », = 7JK(L — 1). One can show that an unbiased estima- 








LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 921 


tor of the variance of 4; is 


a3, = J (J — 1) °R? + K(K — 1)°S3 


(7.5) 
— (JK)"\(J — 1)"(K -— 1)°T%, 
where 
Ri = ps (Yij-- =~ Yi... — Y.j-- + Re 
Ss = 2 (Yi-e- —~ Yin — Y.-2- + y...-)°5 


T% = Ld (Yisee — Yas — Yow — Yoge H Yore F Yj HF Yo — Yo)? 
3 


8. A test for the hypothesis H7, . In a practical situation to which the model 
equation (3.1) and our basic assumptions, including the normality assumption, 
can be applied, the statistical hypothesis of most interest is likely to be that of 
no fixed main effects, namely H,4: a, = --- = a; = 0. In the model involving 
only two factors, Scheffé [10] shows that a 7” statistic can be constructed for 
testing H, . The extension of his procedure to the present model would require 
that the JK vectors (jij. , +--+ , Yr)’, J = 1, +++, J, k = 1, +--+ , K be inde- 
pendent. As (6.3) shows this does not hold true unless »4 = ri = 0, all ¢, 7’, 
an additional assumption which cannot be justified with the present model. The 
likelihood ratio principle is here of no help either, as already remarked by Wilks 
({14], p. 259) in a simpler situation. This is due to the fact that the covariance 
matrix defined by (6.3) is not diagonal and that H, does not completely specify 
Ey: . One can avoid this indeterminacy by following a suggestion of Hsu [5), 
namely by introducing the differences 


(8.1) dy ix = Yrik- — Yrje- y r= 1, ft I- 1, all d k, 
Using a notation analogous to that of Section 6, let 
(8.2) trie = Isjk — Tije, Fe 1, rr I- 1, all ds; k. 


We define vectors R*, US , Vt, Wh (j, k > 1) by relations similar to (6.15) 
but with z?;, substituted for z, , €-g., 


(8.3) R* = L4*(zhu, «++, Zen, ++ Zea)’. 


These JK vectors are thus of dimension J] — 1 only; like the vectors (6.15), 
they are independent. Their covariance matrices are respectively 


(8.4) R? = ((r3,-)) = ( (Tre o Ta. roe + Ti)), r, 7’ 1, aa I- l, 


and Ut , Vo , Ws whose elements uf, , vt, , and w>, are similarly defined in 
terms of the u,,; , v5 and w,, of (6.12). Let also 


(8.5) B* = (6, — Br, -*-,Br-1— 8)’, &* = (JK) OR*, 














922 J. P. IMHOF 

so that E§* = g*, and define as in (6.20) the matrices 

Of = (J — 1)° > USU5’, 0° =(K— "2d Vive’, 
j>1 


= (J —1)"(K - "Pe WiWh 


J>1 k&>1 


(8.6) 


Formulas (6.14) and (7.2) now become 
(8.7) Ro +We = Us+Ve, 
(8.8) EO; = Us, EVs = Vo, EW = W3. 


The elements @;,- , #%. and #%, of the matrices (8.6) can be computed from the 
observations by using relations analogous to (6.21) but with d, substituted 
for yj . Alternatively 


a~* A A A A , 
lise = Cer — bey — te +n, 7,7’ = 1,---,2-1, 


with similar relations in v and w. 


If the covariance matrix R$ were known, one would test the hypothesis H, by 
using the criterion 


(8.9) Ti = R* (RS) *R* 


which has the noncentral x’ distribution with J — 1 d.f. and noncentrality 
parameter 


(8.10) 6; = JKB™ (Ro) '6*, 
which reduces to zero under H, . 

When R¢ is unknown, one might think of using instead of 77 the criterion 

= R* (RI) "R*, where RF is the unbiased estimate of Re based on the suffi- 

cient statistic T of (6.19), ie., by (8.7), (8.8), RS = 03 + V3 — We. AL 
though R?, os . W? are mutually independent, fi former with multi- 
variate eenial and the latter three with Wishart distributions, it does not seem 
possible to obtain the distribution of T,. The case J = 2 easily shows that it 
does, under H, , depend on nuisance parameters, and unlike a T’ statistic, is 
not nonnegative. However, when both J and K tend to infinity, one verifies 
that the limiting distribution of 7, under H, is the x7_1 distribution. Hence for 
large values of both J and K, a satisfactory test of H, at the a level of signifi- 
cance consists in rejecting the hypothesis if T, exceeds the upper a- point, of the 
x7-1 distribution. As is well known, one does not need to compute (R3)~ in 
order to evaluate T; , but can use a formula similar to (8.21) below. 

Consider now the case where J and K are not large enough to justify the use 
of the x’ approximation. Assume, however, that J, K 2 J (in Section 7, where 
we had vectors of dimension J, we assumed J, K > J. Here the vectors have 
dimension J] — 1, so we need only J, K 2 J). The fact that several unknown 
covariance matrices are involved in our model suggests trying to apply a device 
similar to the one proposed by Scheffé [8] for solving the Behrens-Fisher prob- 





LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 923 


lem: Instead of using the estimate Rf of RP in the definition of T; , one would 
like to use one which has a Wishart distribution. This would require finding 
independent identically distributed vectors S, , --- , S, , each a linear combina- 
tion of the vectors of observations (dix, -+- , drag)’, 7 = 1, °°, J, k = 1, 
.»+ | K, and each having mean zero and covariance matrix R? . One would wish 
to do so for n as close as possible to JK — 1, which is as much as one can achieve 
when said vectors of observations are independent identically distributed. But 
S:, --:, S, would then also have to be linear combinations of the vectors 
R*, US, V2 and W%,, j, k > 1. Now because of the minus sign in RO = 
Us + Vo — We, it is clear that no such linear combination except R* itself 
has covariance matrix R? . 

It appears that the only way to construct a test criterion which has under H, 
a distribution free of nuisance parameters consists in looking not at the mini- 
mum variance unbiased estimate R* of (JK)'s*, but at another unbiased 
estimate of it, namely 


(8.11) (JK)S = R* + [(J —1)(K - yr ee Whe . 
> > 


Although this will result in a loss of power of the test obtained below as com- 
pared with the “ideal” test described above, this is the price one has to pay for 
allowing three unknown covariance matrices in the model. The vector (JK)'S 
has the N((JK)'s*, R? + W?) distribution. Let 


M 
(8.12) H = (M — 1)"*>. (US + Va)(US + Va)’, 
m=—2 


where M = min (J, K), then independence of the vectors R*, U7 , Vi, Wi 
together with (8.7) shows at once that a 7° criterion for testing H, is 


(8.13) T’ = JK-S'‘H'S. 


More precisely, § = (M — 1) "(I — 1)"*(M — I + 1)7” has under H, the 
F-distribution with J — 1 and M — J + 1 df. Under alternatives it has the 
corresponding noncentral F-distribution (defined in [10], formula (82)) with 
noncentrality parameter 


(8.14) & = JKp” (Rs + We) 's*. 


The test consists in rejecting H, if § > F,, the upper a-point of the F-dis- 
tribution with the above numbers of d.f., a being the level of significance. The 
“Sdeal” test we were imagining above would have a distribution with non- 
centrality parameter 6; given by (8.10). Going back to formulas (6.12) we see 
that if the differences of — vu — ta are small compared to Ky + Jriv, 
then 8’ should not be appreciably smaller than 8; and so the loss in power due to 
the larger variance of (JK)*S should not be too considerable. Some better 
insight into this will be obtained in connection with (8.23). However, if 
M — I + 1 is small, say it equals only 2 or 3, then the drastic curtailing in the 





924 J. P. IMHOF 


number of d.f. “for error’ as compared with the “ideal” number JK — J + 1 
will make the test a rather poor one. 


We express now the criterion 7° in terms of the observations yj: . Using 


(6.6), (6.7) and the values of p; in Lemma 1 with the appropriate values for n 
yields 


(8.15) 29h = Lyin. — (K* — 1)*(KYy;;.. — yia-)] for k > 1, all i, j. 
Similarly one has 
(8.16) zine = 29 — (J' — 1) “(sf — 2) for (j,k) ¥ (1,1), all. 
The above two formulas and (6.7) easily yield 
tau = (JKL)*y... , 
ta = (JL)"[yee. — (K* — 1)7*( Kyi... — yir-)], 
tin = (KL)*{y;;.. — (J' — 1)7*(J4y:... — ya), 
tin = Lyi. — (J? — 1) (dyer, —yar-) 

— (KP — 1)"(Kyi;.. — yia-) 

+ (J* — 1)"(K* — 1)7*((JK)'y... 

— (J ya. — Kiya. + yur); 


(8.17) 


The equations (8.17) also imply 


> a tisk = JX (eith — zh), 


j>1 kl 

from which one finds by (8.15) 

(8.18) 22, tH = (JKL)*(yar. — ya — Yas + You). 
Let, for convenience, in (8.11), (8.12), 


S = (%, coe 4 Bp5 *** » 8-1)’; 
(8.19) 


H=(M—-1)°G, G=((gr)), Ov = Li fenbem 


where f,» is the rth component of the vector US + Vs. Using (8.1), (8.2), 
(8.3), (8.17), (8.18) shows that one can write, forr = 1,--- ,J — landm = 2, 
ooo Mf. 


8 = d,. + [(J —1)(K — 1)] (dai — du. — dea + d,..), 
(8.20) fem = K' dem. + J' dpm — K*(J* — 1)7(J' d,.. — du) 
— J*(K' — 1)7(K' d,.. — d,.). 








LAYOUT WITH TWO RANDOM-EFFECTS FACTORS 925 


To compute 7” it is not necessary to invert the matrix G: One can use instead of 
(8.13) the relation 


(8.21) T = JK(M — 1) [Lise tse | ~ i]. 

There is in general no shortcut available for the computation of 7*: One has to 
compute separately the s, and f,, from (8.20), then use (8.19) to compute 7* 
from (8.21). If, however, J = K = M, one verifies that 


M 
gn = Md (drm. + dy.m — 2dy..) (dem. + dy.m — Qdy..), 


so that the computational work required is considerably reduced. 

There is in principle no difficulty in evaluating the power of the test against a 
specific alternative. All that is needed is the value of the noncentrality parameter 
8 of (8.14). On the other hand, specifying an alternative requires specifying the 
value of Rt + We = Ut + Vo. This in general would have to be estimated 
from the data by using the estimates (8.6). Unless J = K = M, in which case 
Os + Ve = (M — 1)", it might require a prohibitive amount of additional 
computations. One might then be satisfied to determine the power of the test 
against a “simplified” alternative, namely one obtained by assuming the* the 
three basic covariance matrices (3.3) all have the structure (2.1). From (6.11), 
(6.12) and the relations analogous to (8.4), the 7, r’-element of Us + Ve is 
found to be, under this simplifying assumption, 


Pe = (1 + 8,7) (Le? + Yo - 1); 


where yo = 2p + Koi + Jr = 2p + Ko + Jr , i # 0. Using (3.9) 
this becomes 


Yer = (1 + by)(Ko'as + Joac + 2oanc + Les). 
The matrix (((y*)"" )) = ((y*--)) can easily be computed and one finds 
Db? — *(D Br)’ 
which in terms of a , --- , a; becomes simply 


(8.22) 8 = JK( Koss + Jone + 2oaac + Lot) "D> ai. 


& = JK D> 6 Br (y*)” = JK 


When 2,ai is specified, this value of & can be very quickly estimated by using 
(4.4) with 


& = JKL(MSas + MSac — MS,)" Dai. 


It is interesting to compare (8.22) with the analogous formula that one ob- 
tains when computing the value of the “ideal” noncentrality parameter 8; of 














926 J. P. IMHOF 


(8.10) under our simplifying assumption. One finds that 


(8.23) & = (1 + A)é’, 
with 


(8.24) A = (Ko‘s + Jose + onc + Le?) ona. 


The conclusion we had arrived at in the discussion below (8.14) can now be put 
into the following terms: If A is small compared to 1, the loss of power intro- 
duced in the test by the undesirable last term of (8.11) should not be appreci- 
able. This conclusion is encouraging. In practice, o4sc is often dominated by one 
of oie, @ac, Which according to (8.24) should make A satisfactorily small. 
Estimating separately numerator and denominator, one obtains an estimate of 
A, namely 


A = (MSs + MSac —_ MS,sc)'(MSaac = MS.), 
which is of course not unbiased. 


9. Confidence intervals for the fixed main effects. We consider briefly in this 
section various confidence statements that can be made concerning the param- 
eters a, -:* , a,. Corresponding to the fact that no Hotelling 7° test of H, 
could be constructed on the basis of the estimates 4; of the a,’s is the fact here 
that no confidence interval for a; can be obtained in the ordinary manner by 
using the ratio @;/éa, . In fact, (7.5) shows that 63, is not even a positive in- 
definite quadratic form. 

By analogy with (8.11), let m; = [(J — 1)(K — 1)Z] "Zoskeotin . where 
Zijx is as in (6.15). Then 


Var (m; — m.) = Wi Wi. — W.; + wv.. = os = vis —_ rs + CU — ie te 


The unbiased estimate (JK)*a; = (JK ta. + m,; — m. of (JK)*a; has variance 
20°; + Kris + Iris + 201 — 1)I'L"'o?, as is easily verified. An unbiased 
estimate of this is a;; — 2a,;. + a.. where a,, is the i, 7’-element of the matrix 
(M — 1) 3s(Um + Vm)-(Um + Vm)’ and where as in Section 8 M = 
min (J, K). Furthermore, a;; — 2a;. + a.. is independent of @; . An exact confi- 
dence interval for the parameter a; can therefore be based on the ¢-distribution 
with M — 1 df. of the ratio 


(9.1) [((JK)*'(a; — as) + m; — m)/[(aux — 2a;. + a..)}). 


In terms of the observations one has &; = y;... — y...., then from (8.18) m; = 


[JK(J —1)"(K - 1)" (yar. —ya-. — year. + ys---), and finally from (8.17) 
ai = (M — 1) ox seimeim Where 


Cim = K*yim-- + J Yim: o Kis? _ 1)"*( J Yi. = Ya--) 


— JK = 1) (yi... — yor). 








LAYOUT WITH TWC RANDOM-EFFECTS FACTORS 927 


When J = K = M, then a,;, — 2a; + a.. = (M — 1) a2 (Cin — €m)* re- 
duces to 


M 
ay — 2a. + a.. = M(M — 1)" D> (yim + Yim 
mal 


— Qi — Yom — Yom + 2y....)’. 


For a single difference a; — a, one can proceed in a similar manner and base 
an exact confidence interval on the t-distribution with M — 1 df. of the ratio 


(9.2) (IK) * (yin. — Yorn — a + ay) + my — myl/[(ais — Zaw + avy) 


The denominator can be computed from ay — 2ay + avy = (M — 1)" OX, 
(€im — vm) Which reduces when J = K = M to 


M 
ag — aie + Gee = M(M = 1) D0 (yam $F Ysem — BYiore — Yormes 
mod 


—Yr.m. + Zye...). 


Confidence statements based on (9.1) or (9.2) should be used only if a single 
statement is made and the particular a; or a; — a, considered has not been sug- 
gested by the data. If several confidence statements are desired, Scheffé’s method 
{9} of multiple comparison can be applied when J, K 2 / in a way similar to 
that described in [10], but again based on the nonoptimum unbiased estimate 
(8.11): We estimate a contrast @ = );had(> sh; = 0) with 6 = 
Dixit Alb? + (JK) *m?), where BF = 8, — 6, = d,.. and m? = m, — m,, 
r=1,---,/ — 1. The variance of 0 is o°(6) = (JK) ">, > hye (r2. + wt.) 
and has the unbiased estimate 6°(6) = (JK)">, » hphpat, , where a? = 
Are’ — Ary — Air + a,,. Then, the probability is 1 — a that the totality of con- 
trasts @ = > h,a; simultaneously satisfy 


(9.3) 6 — Se(6) <0 56+ Seid), 


where the constant S can be computed from F, , the upper a-point of the F- 
distribution with J — 1 and M — 7 + 1 d.f., through the relation 


S = (M —1)(1 — 1)(M —1+4+1)"F,. 


The conclusion arrived at in Section 8 that the use of the nonoptimum estimate 
(8.11) in the 7° criterion does not in general affect the power of the test too ad- 
versely implies here that the confidence intervals (9.3) and those based on 
(9.1), (9.2) are not considerably lengthened by the necessary introduction 
of the undesirable m; in the estimates of the a; . 


10. Acknowledgment. The author wishes to express his gratitude to Professor 
Henry Scheffé for suggesting the present investigation and for his constant ad- 
vice and suggestions while it was in progress. Improvements in the presentation 
of the paper have also resulted from helpful comments made by the referees. 


























928 J. P. IMHOF 


REFERENCES 


[1] C. A. Bennett ann N. L. FrRan«Kut1n, Statistical Analysis in Chemistry and the Chemical 
Industries, John Wiley and Sons, New York, 1954. 

[2} G. E. P. Box, ‘‘Some theorems on quadratic forms applied in the study of analysis of 
variance problems, I. Effect of inequality of variance in the one-way classifica- 
tion,”’ Ann. Math. Stat., Vol. 25 (1954), pp. 290-302. 

(3) Jerome CornFIELD AND Joun W. Tukey, ‘‘Average values of mean squares in fac- 
torials,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 907-949. 

[4] Werner, Gaurscui, ‘‘Some remarks on Herbach’s paper, ‘Optimum nature of the 
F-test for model II in the balanced case’,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 
960-963. 

(5) H. L. Hsu, ‘‘Notes on Hotelling’s generalized 7,’’ Ann. Math. Stat., Vol. 9 (1938), pp. 
231-243. 

(6] E. L. LeumMann AND Henry Scuerré, “Completeness, similar regions, and unbiased 
estimation. Part I,’’ Sankhyd, Vol. 10 (1950), pp. 305-340. 

[7] E. L. Leumann anp Henry Scuerrsé, ‘‘Completeness, similar regions, and unbiased 
estimation. Part II,’’ Sankhyd, Vol. 15 (1955), pp. 219-236. 

[8] Henry Scuerré, ‘On solutions of the Behrens-Fisher problem, based on the ¢-distri- 
bution,’’ Ann. Math. Stat., Vol. 14 (1943), pp. 35-44. 

(9] Henry Scuerré, ‘‘A method for judging all contrasts in the analysis of variance,’’ 
Biometrika, Vol. 40 (1953), pp. 87-104. 

[10] Henry Scnerrsé, ‘A ‘mixed-model’ for the analysis of variance,’’ Ann. Math. Stat., 
Vol. 27 (1956), pp. 23-36. 

{11] Henry Scuerré, The Analysis of Variance, John Wiley and Sons, New York, 1959. 

{12} M. B. Wik anp O. Kempruorne, ‘Fixed, mixed and random models,”’ J. Amer. Stat. 
Assn., Vol. 50 (1955), pp. 1144-1178. 

{13] M. B. Witk anp O. Kempruorne, ‘‘Some aspects of the analysis of factorial experi- 


ments in a completely randomized design,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 
950-985. 

[14] S. S. Wixxs, ‘‘Sample criteria for testing equality of means, equality of variances, and 
equality of covariances in a normal multivariate distribution,”’ Ann. Math. 
Stat., Vol. 17 (1946), pp. 257-281. 








TESTS FOR REGRESSION COEFFICIENTS WHEN ERRORS ARE 
CORRELATED 


By M. M. Srppiqui 
Boulder Laboratories, National Bureau of Standards 


1. Summary. In a previous paper [6] the covariances of least-squares estimates 
of regression coefficients and the expected value of the estimate of residual 
variance were investigated when the errors are assumed to be correlated. In this 
paper we will investigate the distribution of the usual test statistics for regression 
coefficients under the same assumptions. Applications of the theory to the cases 
of testing a single sample mean, the difference between the means of two samples, 
the coefficients in a linear trend and in regression on trigonometric functions will 
be discussed in some detail under an assumed covariance matrix for errors. 


2. Introduction. Several authors have studied the effects on common tests of 
significance when one or another of the ideal conditions is not satisfied. The 
effect of correlation between errors on ¢ and z tests for means has been investi- 
gated by Daniels [3]. Box, in a series of excellent papers, including [1] and [2], 
has studied the problem of unequal variances of errors and correlations between 
them in analysis of variance situations. In continuation of these investigations, 
it seemed desirable to study in some detail the distributions of common tests of 
significance, or their variations, for regression coefficients in the usual cases of 
interest. The results contained in this paper may be considered an extension of 
Daniels’, Box’s and Welch’s [7] work. 


3. Test statistics for regression coefficients. Let y = 2’8 + A be the observa- 
tion equation, where y and A are N xX 1 column vectors, 8 is a p x 1 column 
vector, x is a p X N matrix and a prime is used to denote the transpose of a 
matrix or a vector. It is assumed that N > p, z is non-stochastic and of rank p, 
and A is a N(0, oP) vector variate, where 0 is a zero vector and P is a positive 
definite correlation matrix. The notation N(a, D) is used for the normal dis- 
tribution with mean vector a and covariance matrix D. P will be assumed to 
have a specified structure, given by (3.8), and to be known. Although, in princi- 
ple, a non-singular transformation on y exists which takes one back to the 
standard case, from practical considerations it seemed worthwhile to study the 
effects on the usual test statistics when 6 is estimated by minimizing A’A instead 
of A’PA. Some of the reasons for doing this are given in the last section. We 
further assume that z is so chosen that zr’ = J, , the p x p identity matrix, so 
that the elements z,, of z are of the order N~*. This assumption is no restriction 
in principle, and, even in practice, a simple modification in z may be sufficient. 
For example, in the case of a linear or a polynomial trend, orthogonal poly- 
nomials may be used; in the case of regression on the mean or on trigonometric 
functions, a normalizing factor may be introduced. 


Received February 22, 1960. 


929 














930 M. M. SIDDIQUI 
Writing b = zy,v = y — 2'b,n = N — p, we recall that b and S’ = v’v are 
the least-squares estimates of 8 and no’ respectively. Now 
(3.1) b = x(27’8B + A) = B+ 2A; 
hence 
(3.2) Eb = 8B, oB = E(b — 8)(b’ — 8’) = o xPr’. 
Also, 
(3.3) vey — 2’b = (ly — 2’z)A = MA, 


where M = Iy — z’z. Since M’ = M = M’, the characteristic roots of M are 
zeros or ones, and since the trace of M = n, M is of rank n, Further, 


P 
(3.4) Mi; = 845 — Do testes = 84; + O(N”), 
aml 


and 
(3.5) Ev =0,0°V = Ew’ = o MPM, o°R = Ev(b' — 8B’) = o MP2’. 


If P = Iy, then B = 1,, V = M,R = Mz’ = 0, so that v is independent of b. 
Also ES* = no’ and S’/o’ is independently distributed of b as a x’ variate with 
n degrees of freedom. The usual statistic test to the hypothesis concerning the 
value of 8; is 


(3.6) u; = (b; — B;)n'/S, 


which is distributed as a Student variate with n degrees of freedom. In general, 
when P = Iy, 


(3.7) u = nia’(b — B)/S(a’a)' 


is a Student variate with n degrees of freedom for any non-null vector a. 
When P # Iy, neither ES’ = no’ nor the distribution of S’/o’ is that of a 
x’ variate. Furthermore, v and b are correlated so that S’ is not independent of b. 
We now consider the special case 


(3.8) Pi; = P\i-i\ » po = 1. 


It will be assumed that p, is small and }-7- p, negligible so that the departure 
from the ideal conditions is not very great. For example, p = ¢€”, 
or 1/(a’k* + 1), a 2 2. At first glance, these assumptions may seem somewhat 
restrictive, but a little reflection will show that they are quite reasonable. If one 
or more autocorrelations are high, it would be desirable to modify the initial 
model by introducing additional regression variables, presumably stochastic in 
nature. For instance, we may introduce a small order autoregressive scheme for 
A. In the applications which will follow after the general discussion we will 
actually set p, = 0 for k > 1. In this case P will be written as P® and for the 
positive-definiteness of P“’ we will require | p, | < (4) sec [x/N + 1)]. 








TESTS FOR REGRESSION COEFFICIENTS 931 


Evaluating B,;, Vi; , and R;;, where B is defined in (3.2), and V and R in 
(3.5), we find 


N—1 N—k 
B;; = l + 2>> >> TL jek j atk 
kel e=l 
(3.9) 


N-1 


>=1+ 22 LijeLj aris 


N—1 N—k 
Vi = Mi, + 2 > Pe >. M,, Mi, +s 
kml 


Ile 


P P 
(3.10) 1 = Sh 2B tutes + Ete Ttit 
tel t= 
a 2 
+> 2, 2, 20 Sn Se on, 


al rel tel 


_ N-k 
R;; os _ Pe a (Mig Z;.0+4 + Mi o+t Lin) 


N—l p 
(3.11) = pi [ec + Ti+ > > LE Tre Tj tt 


Jol rel 


+ Dri Tr att tn) |, 


where rj = Zj,vs, = 0, and © is to be replaced by = when P = P"”. We notice 
that B,; and V;,; are of the order of unity while R,, is of the order of p,N™*. The 
correlation coefficient between v; and b; is given by Ry,(V«B;;)* = O(p,N™*) 
Hence, if either N is large or if p,; is small, the vector v is almost independent of 
b. As a first approximation, therefore, we will derive the distributions of the 
test statistics as though S’ were distributed independently of b. Now, 


S = v'y = A’MA 





is a non-negative definite quadratic form of rank n in A. If \,, --: , A, are the 
non-zero characteristic roots of A = MP, the distribution of S*/o’ is that of 
Q = >is Aai(1). Here xi(v) denotes a x’ variate with » degrees of freedom 
and all such variates appearing in a linear combination are independent. We 
approximate the probability density function (pdf) of Q by the pdf of a gx’(h) 
variate, i.e., 


—s/2 h/2-1 
ee 


(3.12) k(z; 9, h,) = (29)"*1(h/2) 


for z > 0, Oforz s 0, 


where (see, for example, [1]) 


(3.13) g=D>x/>da, k= (Dd)'/DN, 


so that the first two moments of Q are equal to the first two moments of k(z; 9, h). 
g will be called the scaling factor and h the effective number of degrees of freedom 





932 M. M. SIDDIQUI 


this approximation can be improved in several ways [5]. One way is to write the 


pdf of Q, p(z), as 


= !P(h/2) dy, - 
(3.14) ple) = k(z;9,h) 2 rank 573) aye om (=,) 


associated with Q. Box [1] has shown that A < n. If greater accuracy is desired 


where 
(e) ¥ m-+c\ (—2z)’ 
Lm (x) = j— (s - 9 


is the Laguerre polynomial of degree m, and the d,, are given by 


dm = (2g)™ | p(z)L*- (2/29) de 
(3.15) eg saan 
—* m— + = i(9n\"~in, /41 
->( ad ) 1)’(2g)”"*y;/j!, 
where y; = EQ’. In particular, d, = 1, d; = d, = 0, and 


(3.16) d, = —4[D0" — hg’. 


The convergence of such series as (3.14) in the case of a linear combination of 
x’ variates with all coefficients positive has been proved by Gurland [4]. 

If Q, and Q, are two independent quadratic forms in normal variates with 
zero means, the distribution of their ratio can be obtained easily from their 
joint distribution, where each distribution is developed in the form (3.14) In 
particular if Z is a N(0, 1) variate and Q isa bt \xi(1) variate independent of 
Z, then the distribution of t = Z(gh)'/Q' is given by 


Pr(|t| 2 t) = I,,(h/2, 1/2) 


j=0 


(3.17) 


+ 2 (29) "dm (-1) (") I3(j + h/2, 1/2), 


where z = (1 + t/h)™, and, for p > 0, q > 0, 


I,(p, q) = (Bp, ar’ [ 2” *(1 — 2)*" dz, for0 <z <1, 
0 


0 for zs 0, 1 for z21. 


The leading term of (3.17) indicates that ¢ is approximately a Student variate 
with A degrees of freedom. 
Now, 
z = a’(b — 8)/o(a’Ba)' 
is a N(0, 1) variate for any non-zero vector a, and is approximately independent 
of S*/o*. Hence 


t = a’(b — 8)(gh)'/S(a’Ba)' 








TESTS FOR REGRESSION COEFFICIENTS 933 


is approximately a Student variate with h degrees of freedom. The alternative 
test statistics, 


w=a'(b— 6n'/S(a’ Ba)’, 
u = a'(b — 8)n'/S, 
are related to ¢ by the relations 
w= al, “= aa, 
a, = (n/gh)’, ay = (a’Ba/a'a)’. 

Since a, does not depend on the vector a we observe that the distribution of w 
does not depend on the choice of a, so that w,, w,, --- , w, will be identically 
distributed as at. It will be found in many cases that a, = 1 + O(1/n), so that 
considering w as a Student variate with n degrees of freedom involves mainly 
an error in degrees of freedom, which is not very serious if n and h both are 
moderately large. However, in general a, will not be close to unity and considering 
u as a Student variate with n degrees of freedom will lead to serious errors in 
probability statements. Since a, depends on the choice of a, the distribution of 
u will change with a change in a. In particular m4, --- , u,, in general, will 
have different distributions. 

Let ¢. denote a number such that Pr (|¢| 2 t.) = a. This number can be 
approximately determined through interpolation in existing tables of Student 


distributions. The 100 (1 — a) per cent confidence interval on 8; is approxi- 
mately given by 


(3.18) b, — taSB},;/(gh)' < 8; S by + taSB},/(gh)’. 


In many cases it will be difficult to determine the characteristic roots \, , --~ , A. 
of the matrix A. We only require, however, the sums of powers of these roots to 
determine the values of g, h, d; , d, etc. These may be found by the relations 


l i = tr A’, = 1,2,---, 
(3.19) y ,=tr r 
where “tr” stands for the trace of a matrix. 


In the following applications we will confine attention to the case when 
P = P”, ie., when 


PY = pes; pe; = 9, for j > 1. 
In the case of testing a single sample mean we will also consider P = P®, where 
PP = *, |p| <1. 


It is believed that applications of the theory presented in this section will be 
found mostly in the analysis of time series. If we have a record on a time series, 
which we believe to be stationary, we may wish to test the hypothesis that the 
process mean is zero. If we have several samples we may wish to compare their 








934 M. M. SIDDIQUI 


means. In some other cases we may wish to test the existence of a linear trend 
or of cycles. In all such cases we may assume that the errors form a stationary 
process with an autocorrelation function p, . The theory then provides adequate 
test statistics, when N is large, noting that P;; = p, i, . 
4. Single sample mean. Let 

y: = B/N’ + Ay, t¢=1,2,--- ,N. 
Since there is only one regression coefficient, we omit the suffixes from 8 and B. 
Now 

N N 
b = (N)'g = N ve S = dh — N@. 


The elements of the matrix M are obviously 5;; — 1/N, where 6;; = 0 if i # j, 
1 if i = j. The usual test statistic concerning 8 is 


u = (g — B/N*)(nN)'/S, n=N-—1. 
(a) In case P = P’, we have B = 1 + 2p, — 2p,/N, and, evaluating >> \ 
and Zz \* from the relations (3.19), we obtain 
da = tr MP™ = n(1 — 2p,/N) 
<= tr (MP)? = n(1 — 4p,/N) + 2(n — 2)pi + 4(N + 1)pi/N’. 
From these, g and h are easily determined for any given value of p, , and then 
a = (n/Qd)', a = (B) 
w= at, U = aco. 
(b) Incase P = P™ , we have, neglecting p”, 
B = (1 + p)/(1 — p) — 2p/{N(1 — p)'), 
DA = N — (1+ p)/(1 — p) + 2p/N(1 — 1)’, 
DW = N(L + 6°)/(1 — 6°) — (1 + )*/(1 — pp)’ — 29°/(1 — 0°)’ 
+ 4p(1 — p’)/{N(1 — p)*(1 + p)} + 40°/{N*(1 — 1)‘. 
As an illustration, values of g, h and approximate 5% points of 4, w and u for 


p, = —.2,0, +.2 when N = 10 are given in the following table. The top value 
in each column corresponds to P = P” and the bottom value to P = P™. 


er £ a tos wos 


2.282 2. 
2.285 2 


—.2 . 1.100 
1.102 


2.262 
2.262 


83 |82|88| 


- 285 
. 284 


_— 








TESTS FOR REGRESSION COEFFICIENTS 935 


5. Two samples. We distinguish the sample and associated quantities by a 
subscript or an additional subscript, e.g., 


ye = B/N? + du, t= 1,2,---,Ni, i= 1,2, 
where A;, 7 = 1, 2, are independent N(0, oiP;) vector variates. The case 
P; = Iy, ,i = 1,2 and 0; # o} has been studied by Welch [7]. We will treat here 
the general case when i, 0; , P; and P; are arbitrary. We will assume that 7, 
and P; and @ = ¢j/o; are known and that N; and N;, are large. The variate 

Z = [1 — G2 — Br/(N1)* + B2/(N2)']/o2(0Bs/N; + Bo/Ns) 
is a N(O, 1) variate and 
N; Ny 
Si = > d. — Na > Ai. — Nii 
tal t=l 
=AM~A;, i=1,2, 
are distributed, independently of each other and approximately independently 
of Z, as ost huxi (1), where \,;,7 = 1, --- , my are non-zero characteristic 
roots of A; = M,P;and n; = N; — 1. Hence Q = Si/o} + S}/o; is distributed as 
St SP sa’(1). Let g and h be the scaling factor and the effective degrees 
of freedom associated with Q, i.e., 
h=(Claw/)/LUM, g=LUN/CoM, 
where the summations over j are from 1 to n, and over i from 1 to 2. Then 
t= [fi — fh — B:/(N,)' + B2/(N2)'}(gh)*/{OBi/N; + B./N+)(Si/6 + S3)}' 
is approximately a Student variate with h degrees of freedom. 


6. Linear trend. We take N to be an odd integer and consider the linear trend 
in the form 


y. = NB, + (N(N* — 1] 7*(12)*Balt — (N + 1)/2} + As, 


t= 1,2,---,N. 
From [6], we have 


~, 


: N v 
b, = Ng, by = (12)'N*(N* — 1) 7 ty, — (3)'N*N + 1)4N = 1) 7%, 
tel 


N 
S=7 > yi — bi -— b. 
t= 


The elements of the matrix M are given by 
My = 85 — 1/N — 3[N(N — 1)J*(2i — N — 1)(2j — N — 1). 
If P = P™, we have 
Bu= 1 +2 — 2p/N,  Ba-= 0, 
Bu = 1 + 2p; — 6p,/N — 4p,/{N(N* — 1)}. 





936 M. M. SIDDIQUI 


Evaluating >- \ and > »’, we find, writing n = N — 2, 
DA = n(1 — 4p,/N), 
Dr = n(1 — 8p,/N) + 2(n — 3)pt + 16pi/N 
+ 16pi/N* — 24pi/{N*(N — 1)}, 
from which h and g are determined. Finally 
t = S[a;(b, — B:) + a2(b2 — B2))(gh)*/faiBu + aBu}' 


is approximately a Student variate with h degrees of freedom for arbitrary con- 
stants a, and a, not both equal to zero. 


7. Regression on trigonometric functions. Consider 
@ 
ys = Bi/N* + (2/N)" (Bo: cos wit + Boss sin wt) + Ay, t= 1,2,---,N, 


where up; = 2rw;/N,i = 1, --- , g, and w; are positive integers less than N and 
different from each other. Again, from [6] we obtain 


N N 
b = Ng, by = (2/N)' 2 ys cos pt, bain = (2/N)' y: sin pd, 


N 2e+l 


S= Vy -vi- dpi, 


tml t—2 
n= N—2q-1, 
@ 
M,: - 5st an N“* = 2N*>> COs 4,y(s — t). 
im1 
Assuming P = P"”, we also have 
By =: ] + 2p1 = 2p,/N 
Baio, = 1 + 2p, cos uw; — 4N~“p, cos pw; 
Baisa 241 =1+ 2p COS ui, 


Evaluating >~ \ and >>)’, we find 


@ a 
- A=n — 2p — 4p, >. cos uw; + 2N~"*p, + 4N™ ‘n>, COS py, 
tml t=] 


@ a 
zr =n— 2(n- 2) pi — 4p, — 8p. >. COS wy — 401 2 cos 2y,; 
1 =! 


q q 
+ 4N~"p, (1 +2 7 cos us) + 4N ‘pi (1 +2 7 cos 2 u:) 
t=1 t=1 





TESTS FOR REGRESSION COEFFICIENTS 


@ q 
+ AN*oi (24? + 89 +1 + 2 3 cos we + 27 008 2 m 


¢ 
+2 DF c08 ws 008 my) 
+) 


The remaining steps for testing any one of the regression coefficients are straight- 
forward. 


8. Concluding remarks. In the preceding discussion we have assumed that 


(i) the elements Pi; = Pii-~ji, Po = 1, 
(ii) p; is small and p:, ps, ~~~ are negligible, and p, is known, or 
(ii’) N is large and p, is known. 


As was remarked earlier, if P is known, it is possible to find a non-singular 
matrix, D, such that DPD’ = Iy . The transformation y* = Dy, then, takes us 
back to the ideal situation as the covariance matrix of A* = DA is o’/y . Since, 
on theoretical grounds, such a transformation is desirable before applying the 
least-squares method, or, equivalently, the minimum variance unbiased linear 
estimate of 8, obtained by minimizing A’PA, must be preferred over the least- 
squares estimate, b, obtained by minimizing A’A, the reasons for using the latter 
procedure must be sought in practical considerations. Some of the reasons may 
be enumerated as the following. Firstly, if N is moderatley large, it may become 
quite laborious to evaluate D or to work with the transformed variable y*. 
Secondly, one may be dealing with several regression problems, the covariance 
matrices of errors in different problems being different, and one may wish to 
streamline the calculations. Thirdly, and this is the most important reason, in 
almost all practical situations, P will be unknown. In this case, if we estimate 8 
and the elements of P (under some assumed structure other than J) simul- 
taneously, say, by the maximum likelihood method, the estimate, 8, of @ will 
become non-linear in y. The problem of finding the distribution of 8, and of ob- 
taining suitable statistics for testing hypotheses concerning 8, will become ex- 
tremely complicated. The only suitable procedure seems to be to proceed as if 
P = Iy , and to obtain the least-squares estimates, b, which are linear, unbiased 
and asymptotically efficient. The autocorrelations, appearing in the statistic t¢, 
will have to be replaced by the serial correlations calculated from the residual, v. 
Although this point needs further investigation, it is the feeling of the writer 
that, for large N, the significance level of ¢ will not be affected seriously, at least 
under the assumption that only the first autocorrelation will be estimated. The 
error involved in using the sample serial correlation in place of the unknown 
autocorrelation will, presumably, be of order N~ in probability. 

We further observe that, as a first approximation, the distribution of t was 
obtained as if S’ were independent of b. It would be of interest to improve this 
approximation by taking into consideration the correlation between 6 and 8S’. 





938 M. M. SIDDIQUI 


REFERENCES 

{1} G. E. P. Box, “Some theorems on quadratic forms applied in the study of analysis of 
variance problems, I.,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 290-302. 

(2) G. E. P. Box, ‘Some theorems on quadratic forms applied in the study of analysis of 
variance problems, II.,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 484-498. 

[3] H. E. Dante.s, ‘The effect of departures from ideal conditions other than non-nor- 
mality on the ¢ and z tests of significance.’’ Proc. Cambridge Philos. Soc., Vol. 
34 (1938), pp. 321-328. 

[4] Joun GurRanp, “Distribution of quadratic forms and ratios of quadratic forms,’’ Ann. 
Math. Stat., Vol. 24 (1953), pp. 416-427. 

[5) P. B. Parnarg, ‘‘The non-central x? and F-distributions and their applications,’’ Bio- 
metrika, Vol. 36 (1949), pp. 202-232. 

(6) M. M. Srpp1qut, ‘‘Covariances of least-squares estimates when residuals are correlated,’’ 
Ann. Math. Stat., Vol. 29 (1958), pp. 1251-1256. 

{7} B. L. Wetcn, ‘‘The significance of the difference between two means when the population 
variances are unequal,’’ Biometrika, Vol. 29 (1937), pp. 350-362. 





MIXED MODEL VARIANCE ANALYSIS WITH NORMAL ERROR 
AND POSSIBLY NON-NORMAL OTHER RANDOM 
EFFECTS: PART I: THE UNIVARIATE CASE’ 


By 8S. N. Roy anp Wuirrie.p Coss 
University of North Carolina 

0.1. Introduction and summary. The mixed model with one factor represented 
by fixed effects, one factor by random effects, and a normal error, has often 
stipulated that these random effects be a sample drawn from a normally dis- 
tributed population. In the case of a single response (or univariate) experiment, 
the variance of this normal distribution is a natural measure of the dispersion 
of these random effects, and confidence bounds on the ratio of this variance to 
the error variance [13], [14], [9] and simultaneous confidence bounds on both 
variances (in the latter case with a confidence coefficient 2 a specified value) 
[9] have already been found for certain classes of experimental designs. But when 
a distribution is not normal—or not assumed at the outset to be normal—the 
variance may not reveal as much about the distribution as some other measure 
such as interquartile range. In the present paper we seek confidence bounds on 
what, in a sense to be explained presently, might be called representations of 
the interquartile range and of analogous differences between higher order quan- 
tiles of the population from which the random effects are drawn. The method of 
obtaining these bounds involves an element of approximation comparable to 
grouping continuous data into k classes, since it replaces the actual random- 
effects variate by a “substitute variate’ having k equally probable discrete 
values. The main idea is this. Let us assume, for simplicity of discussion, that we 
have a real valued stochastic variate. One comment here might be helpful. If 
the stochastic variate is observable, it seems natural to attempt to approximate 
its unknown distribution by introducing unknown probabilities over a finite 
set of preassigned class intervals, then trying to estimate these probabilities 
and then (especially for a continuous distribution) increasing the number of 
class intervals. On the other hand, if the variate is unobservable, as in the present 
set-up, it seems natural to try to approximate the distribution by replacing the 
stochastic variate by a “substitute” variate which is supposed to take, as a first 
approximation, two (unknown) values with equal probabilities, or as a second 
approximation, three (unknown) values with equal probabilities, or in general k 
(unknown) values with equal probabilities. We then try to estimate, in terms 
of our observations, these unknown values, which may be regarded as approxi- 
mations to the Ist, 3rd, --- , (2k — 1)th quantiles of the unknown distribution. 
The random effects variate postulated in our model may have either a continuous 


Received August 7, 1959; revised May 25, 1960. 
! This research was sponsored by the United States Air Force through the Office of Scien- 
tific Research of the Air Research and Development Command. 


939 





940 8. N. ROY AND WHITFIELD COBB 


or a discrete distribution, provided in the latter case there are enough distinct 
values to make these k quantiles meaningful parameters of the distribution. 
From now on, for brevity, we shall refer to these unknown values as the quantiles 
of the unknown distribution. It will be seen later that it is the differences between 
these unknown values rather than the unknown values themselves which we can 
estimate or make inferences about, and the number, k — 1, of such differences 
which can be estimated is restricted by the experiment. 

Turning now to the confidence bounds, we observe that in the derivation of 
these bounds use is made of the same kind of sums of squares as in the normal 
variance components analysis. Unlike the more familiar confidence statements 
where the confidence coefficient may be specified at will, here except for the case 
of two blocks, only a lower bound on the confidence coefficient is specifiable, and 
this includes as a factor a decreasing function of k, the number of discrete values 
of the substitute variable. For k = 2, 3, 4, 5 the geometric shape of a (k — 1)- 
dimensional confidence region has been found. 

It is also shown how the usual inference about the fixed effects can be made 
from this model and then how the above type of confidence bounds can be found 
for each of several random-effects factors in an experiment with orthogonal de- 
sign. 

In the case of a multiresponse—usually called multivariate—experiment, the 
model frequently stipulates that the random effects be samples from a multi- 
variate normal population. Although the variance matrix of this distribution has 
a readily available estimator, confidence bounds have presented many diffi- 
culties. For the extremely restricted model in which the variance matrix of each 
random-effects factor is proportional to the variance matrix of the error, Roy and 
Gnanadesikan [10] obtained simultaneous confidence bounds on the characteristic 
roots of this latter matrix and on the proportionality constants. In Part II of this 
paper the authors present (with a confidence coefficient greater than or equal to 
a preassigned value) confidence bounds on the characteristic roots of the variance- 
covariance matrix of a random-effects variate without assuming any such rela- 
tion to the error matrix. The second part will also consider the p-variate exten- 
sion of the univariate substitute variate and the associated confidence bounds 
for the case where the p-dimensional distribution of the random effects is not 
necessarily normal. This development will be only indicated in principle for 
p > 2 but will be discussed in some detail for the case p = 2. 


0.2. Notation and presuppositions. A general m by n matrix will be denoted 
by a capital Latin letter from either end of the alphabet, say A(m x n), its 
transpose by A’(n X m), but certain special letters will denote special types of 
matrices. Thus I(m) denotes the m-rowed identity matrix; J(m x n) has every 
element unity; K((m + 1) X m) is I(m) bordered below by a row of zeros; 
O(m X n) has every element zero. Triangular matrices are denoted by T(m), 
orthonormal by L(m x n). Repeated use is made of an (m — 1) X m ortho- 
normal matrix (m — 1 mutually orthogonal rows of m elements) all of whose 





MIXED MODEL VARIANCE ANALYSIS: PART I 941 


rows are orthogonal to a row of m identical elements. A (but by no means the 
only) matrix having these properties is readily obtained by removing the row 
of identical elements from the matrix of the Helmert Transformation {Kendall 
and Buckland, A Dictionary of Statistical Terms, p. 126). For this reason we de- 
note any such matrix by H((m — 1) XK m). The maximum and minimum 
characteristic roots of the matrix A are denoted by chieax(A) and chaja(A) re- 
spectively. A column vector of m components is denoted by a lower case letter 
such as a(m), whereas a’(m) denotes a row vector. In particular o and j denote 
vectors whose components are all zero and all unity, respectively. After the 
dimensions of a matrix or vector have been indicated, that part of the symbol 
may be omitted in subsequent references to the same matrix or vector. 

Existence theorems for triangular matrix factors of certain matrices are 
proved in [12]. 

The direct sum of A(m x n) and B(p x q) is defined to be 


ee x n), O(m X ay 
O(p X n), B(p X q) 


and is denoted by A + B. The left direct product (Kronecker product) of 
A(m xX n) and B(p X q) is defined to be 


buA, by»A, ro bigA 
bu A, boA, a bogA 


pA, bpA, 


and is denoted by A - x B. Both the notation and the properties of direct sums 
and products may be found in [5]. Inverses of partitioned and patterned matrices 
will be found by the methods of [1] and [11]. 

Some properties of special matrices are listed in Appendix A and referred to by 
number, e.g. (A.4), when specifically used. 


1. Random Effects in a Two-Factor Mixed Model Uniresponse Problem. 

1.1. The two-factor model and its structure matriz. We start with a general two- 
factor model in which each observation is assumed to be the sum of three terms, 
the first two corresponding to the two factors or criteria by which the experi- 
mental units are classified and the third term being of the nature of an error. 
Included in this model is the postulate that the n errors, in a set of n observa- 
tions, are independently and identically distributed normal deviates, each with 
distribution denoted by N(0, 0”). We shall suppose that there are ¢ categories 
in the first classification and s in the second. The unobservable terms or ele- 
ments of the model we denote by a;,i = 1,2, --- ,t, andb,;,j = 1,2,---,«. 
For convenience we shall refer to these a’s and b’s as treatment effects and block 
effects respectively, but this designation should not limit the application or 





942 8. N. ROY AND WHITFIELD COBB 


prejudice the interpretation of the model. The important distinction is that 
whereas the a’s are regarded as unknown but fixed constants, the b’s are as- 
sumed to be a sample of size s from some unknown (but presumably continu- 
ous) distribution, which may or may not be normal! but is postulated to be inde- 
pendent of the normal error. 

In any planned experiment there will be n experimental units, say one from 
each block-treatment cell. This is before we have made any observations at all 
on any unit. Now according as we make one or several types of observations on 
each unit, it is a univariate or multivariate (that is, a uniresponse or a multi- 
response) problem. The response from each experimental unit would presum- 
ably depend on the block-treatment cell the unit came from. For a uniresponse 
problem we have n observations on the n experimental units. If the observations, 
treatment effects, block effects, and errors are respectively ordered and written 
as column vectors—y(n), a(t), b(s), e(n)—then the two-factor model described 
above can be represented by 


(1.1.1) y(n) = M(n X (8 + t)) ol + e(n), 


where 
(1.1.2) M(n X (8s + ¢t)) = [Mo(n X t), My(n x s8)]. 


That M is partitioned into’ two submatrices and that the rank(M) s s +t — 1 
result from the basic assumptions of the two-factor mixed model as stated above. 
Hence M may with some justification be called the ‘“‘model matrix” for the 
observations y. On the other hand, the actual elements of M are not determined 
until a specific experimental design has been selected and each experimental 
unit uniquely classified according to a field plan consistent with this design. 
Hence M has frequently been called the ‘design matrix.”” Throughout this in- 
vestigation, both the particular field plan and the type of design will be left un- 
specified, but the general pattern or structure of M will be known. For these 
reasons we shall call M by the less specific name, “structure matrix.” 

We require that the design be connected, but we do not at present specify 
whether complete or incomplete, balanced or partially balanced. In any of these 
cases an experimental unit will belong to one and only one category of each 
factor, and hence each row of each submatrix, My and M, , will consist of zeros 
except for unity in some one column. Thus for a two-factor model the structure 
matrix will always be such that 


(1.1.3) M(n X (s + ¢)) ae] = Mi — Mij = o. 


In what follows we shall consider this to be the only independent linear relation 
among the columns of M. Since any useful design will have n 2 s + t, it fol- 
lows from (1.1.3) that rank(M) = s + ¢ — 1 = r*, and hence any r* columns 





MIXED MODEL VARIANCE ANALYSIS: PART I 943 
of M would constitute a basis for M. We agree to use the first r* columns and 
to denote this basis by M;(n x r*). Thus, by (A.4), 

M,;(n X r*) = M(n X (8 + t))K((s +t) K (8 +t — 1)) 
[Mo(n X t), Mi(n X 8)K(s X (8s — 1))], 


where K is the special matrix defined in Section 0.2. Then M is expressible in 
terms of its basis, 


(1.1.5) M(n X (8 + t)) = Mi(n X r*)[I(r*), f(r*)], 
where f’(r*) = [j’(t), — j'(s — 1)]. 


(1.1.4) 


1.2. Three orthonormal transformations and the resulting statistics. Whether the 
ultimate purpose is estimation, testing hypotheses, or confidence bounds, sta- 
tistical inference about the block effects will require a set of statistics whose 
distribution depends upon b and not upon a; and statistical inference about the 
treatment effects will require a set of statistics whose distribution depends upon 
a and not upon b. To this end we define u and v by the following transformation 
on the observations y: 


ee _ My 
(21) LY - DD 


= (To'(t — 1)H((t — 1) X t) + Tr*(s — 1)H((s — 1) X 8) 
-K(s x (s — 1))] (M/MJ"Myy, 
where T, and T, are lower triangular matrices defined by 
T:T, = [H((t — 1) x t), O((t — 1) & (s — 1)) MM)" 


(1.2.2) { H’(t x (t — 1)) | 
O((s — 1) K (t — 1)) 
and 
T:T; = (O((s — 1) & t), H((s — 1) & 8)K(s & (s — 1)))(M’My" 
(1.2.3) { O(t x (s — 1)) | 
K’((s — 1) X s)H’(s X (s — 1)) 


A convention as to sign makes Ty and T, unique. 


Although (1.2.1) was just used to define the statistics u and v, it may be 
combined with 


u Lo((t — 1) XK n) 
(1.2.4) [* . eat -—1)x "Is 


to define the matrices Ly) and L, . For any M meeting the specifications of Sec- 
tion 1.1, Ly and L, are individually orthonormal, that is, L.Lyo = Iand L,L, = L. 
However, it is only under additional restrictions to be discussed later (defining 





944 8. N. ROY AND WHITFIELD COBB 


what are called orthogonal designs) that Lo would be orthogonal to L, , that is, 
LoL; = O. 


By (A.3.11) of [12], the basis M, of the structure matrix M determines an 
orthonormal matrix Ls such that 


(1.2.5) M; = T(r*)Ls(r* X n). 


By a convention of sign, both T and Ly can be made unique. Then any (not 
unique) orthonormal matrix L((n — r*) x n) such that 


(1.26) ac] “lel 
is used to define a third set of statistics 
(1.2.7) w(n — r*) = Ly. 
Under the same general specifications of Section 1.1, the matrix 
Li(n — r*) X n), 


which has been defined in (1.2.6) as the orthogonal completion of Le , is easily 
shown to be orthogonal to both Ly and L,. That is, LL; = O and LL = O. 
Thus w is a stochastic variate independent of both u and v whether or not the 
stochastic variates u and v are mutually independent. When (1.2.5), (1.1.5), 
and (1.1.1) are appropriately substituted into (1.2.7), the mutual orthogonality 
of L and Ly permits the simplification 


(1.2.8) w=L {Lert f] 8 + e} 


= Le, 


which shows that w is entirely free of both treatment effects and block effects. 
Similarly when (1.1.5), (1.1.1) and (1.2.4) are appropriately substituted 
into (1.2.1), the following simplification is made possible by (A.1) and (A.7) 


[*| = | eee (MMM { MT, fl [e| + : 
~[orrean | 8[5]+[1]« 
[o'r |Lo'r she] + Lr] 
~[ortae ram) |+[r]: 
-(Smals]+[E]: 


fest) 
‘Hb + Lie 





MIXED MODEL VARIANCE ANALYSIS: PART I 945 


Thus (1.2.9) shows both the general relevance of the statistics u and v to the 
unobservable effects a and b postulated in the model and also the simple form 
of this relationship. 

The particular merits of u, v, and w are (1) the orthonormality of L, Ly , and 
L, has preserved for w and the error terms of u and v the same independent 
normal distribution postulated for e; (2) the number of components in u, v, 
and w, is the number of degrees of freedom belonging to a, b, and error respec- 
tively; (3) any estimable contrast among the components of a or b can be ex- 
pressed easily in terms of u or v; (4) in a two-factor fixed-effects model the 
vanishing of &(u) or &(¥) is a necessary and sufficient condition for the equality 
of all fixed components of a or b respectively—the condition usually described 
as “no treatment effect” or “‘no block effect”; (5) sums of squares for testing 
such null hypotheses on the assumption of fixed effects may be obtained as inner 
products, u’u and v’v, or the same sums of squares may be obtained without 
ever finding T, and T, explicitly; (6) any other testable hypothesis about the 
components of a or b can be tested using sums of squares easily obtained from 
u, v, and w, or expressible in terms of them even when obtained differently. But 
the chief use to which v and w will be put in the present paper is making some 
inferences about an unknown population on the assumption that b consists of s 
independent and identically distributed random samples from that population. 

1.3. Quasi-confidence bounds. Since Lye and w are beth N(o, o°I), e’Lilie/o’ 
and w’w/o’ are both central chi square variates with s — 1 and n — r* df. re- 
spectively. Moreover, since these two chi squares are independent, 


(n — r*)e’Lilye/(s — 1)w'w 


has the variance ratio distribution. Thus for a chosen a, < 1, there is a con- 
stant F., such that 

— r*)e'l. 
(1.3.1) pr{™ ad Fa. =1—a. 


(s — l)ww 


It is important to note that (1.3.1) is true regardless of the population from 
which b is a sample and regardless of the computed values of v. 

We now proceed from the probability statement (1.3.1), which involves only 
normal error components, to a confidence-type statement about the unobserv- 
able block effects. First the error vector Lye is expressed as the difference between 
the computable statistic v and the postulated random vector Tj Hb. Thus the 
inequality in (1.3.1) becomes 

[v — T;Hb)’[v — T;'Hb] < (s — 1)w’wF,,/(n — r*) 
or 
(1.3.2) |v — Ti"Hb| s ((s — 1)wwF,,/(n — r*))’. 
For s = 2, v and T; Hb are scalars such that 
v — ((s — 1) w'wF,,/(n — r*))' < T; Hb 


(1.3.2a) ; 
Sv+ ((s — 1)w'wF,,/(n — r*)) 





946 Ss. N. ROY AND WHITFIELD COBB 
with the same probability, 1 — a, as in (1.3.1). For s 2 2, (1.3.2) implies, 
but is not implied by, 
|v| — ((s — 1)w’wF.,/(n — r*))' s |Tr* Hb| 
< |v| + ((s — 1)wwF,,/(n — r*))*. 


Hence (1.3.3) is true with probability no less than 1 — a and possibly greater. 
Since the middle member of (1.3.3) is necessarily non-negative, the left member 
may be replaced by a suitably defined non-negative bound. Thus we let 


(1.3.4) h = [v’v]’ — [(s — 1) w’wF,,/(n — r*)}' if > 0, 


(1.3.3) 


L,=0 otherwise, 
and 


l, = [v’v]' + [(s — 1) w'w'F,,/(n — r*))’. 
The confidence-type statement obtained from (1.3.1) is 
(1.3.5) Pr{l, S$ [b’H’(T,T;)“Hb}’ < L} 2 1— a. 


The bounds |, and |, are determined by the computable statistics v and w and 
the chosen a, . But since [b’H’(T,T;)~Hb}' is not a parameter of a distribution, 
(1.3.5) is not a genuine confidence statement. It is here called a ‘‘quasi-confi- 
dence” statement. Because it may be used to obtain a confidence statement 
ultimately, (1.3.5) may also be called a “preliminary” confidence statement. 

Since in practice a, would be chosen to be small, the increase in the prob- 
ability of (1.3.5) over the probability of (1.3.1) will be very small in comparison 
with 1 — a. This increase, by itself, need not be disturbing. What is disturb- 
ing is that the quasi-confidence interval, from |, to , , may be too wide. It is 
quite possible that even in the near future quasi-confidence intervals narrower 
than that of (1.3.5) may be found by others if not by us. In the meantime, in 
an application the interval of (1.3.5) may not be too wide from a practical 
standpoint. We have not tried to improve upon (1.3.5) since the main emphasis 
of this paper is on the inferences which can be made without assuming a distri- 
bution for b, but working in terms of the substitute variate in the sense briefly 
explained in the introduction and to be developed in Section 1.5. For s = 2, 
that is, for the case of two blocks, it is always open to us to go back to (1.3.2a) 
and use, instead of (1.3.5), 


(1.3.6) Pril? < Ty'Hb < I} = 1— a, 


where I} and J} are the left and right sides respectively of the inequality (1.3.2a). 
It is also open to us to make corresponding changes at all subsequent stages, 
keeping in mind the limitations stated after (1.5.2) and (1.6.1) imposed by an 
s so small. We believe that the idea of a substitute variate and the kind of use 
which will be made of it in this paper could have much wider application, es- 
pecially when dealing with unobservable stochastic variates with distributions 
about which it might be wise to make as few assumptions as possible. 





MIXED MODEL VARIANCE ANALYSIS: PART I 947 


1.4. Simplifications and restrictions. Although T, appears in the formal defi- 
nition of v, (1.2.2) and (1.2.1) may be combined to obtain the sum of squares 


vv = yM,{M:M,)° ny {10, HK)[M;M,] 


(1.4.1) O ~1 

| ey |} (0, HOM MT 'Miy 
in which T, no longer appears explicitly. Moreover, T; enters into the middle 
member of (1.3.5) only in the symmetric matrix 


(1.4.2) H’(T,T;)“H. 


Using the definition of T, in (1.2.2), (1.4.2) can be expressed in terms of M;. 
Also M;M, can by means of (1.1.4) be expressed as 


M.M., MoM.K 
K’M:M., K’M\M.K} 
and then by (A.9) the inverse of this (s + ¢ — 1)th order matrix can be ex- 


pressed as a partitioned matrix whose submatrices involve inverses of only tth 
order and (s — 1)th order matrices. Using this result, (1.4.1) reduces to 


v’v = y'(I — MoMoM.)"M,}M,K{K’M,M,K — K’MiM, 
x [McM.]"M.M,K} “K’M;{I — MoM;M.J"Mily 


(1.4.3) 


(1.4.4) 


and (1.4.2) becomes 

(1.4.5)  H’[K’H’)"K’{MiM, — MiM.[MoM.)"M.M,|K(HK]"H. 

By means of (A.8), whose conditions are satisfied, (1.4.5) is further reduced to 
(1.4.6) M:M, — M:Mo{M.M.J "MoM, . 


These simplifications have not required any more restrictive assumptions 
about the structure matrix than those set forth in Section 1.1 above. But even 
greater simplification is possible for certain types of experimental design. 

Suppose (1) every treatment is applied to r experimental units (in r distinct 
blocks). Suppose (2) every block contains q experimental units (to which q 
distinct treatments are applied). Then, for any design satisfying these two speci- 
fications, (1.4.6) reduces to 


(1.4.7) qi(s) — r’(MoM,)'MoM, . 


As a further restriction, suppose (3) for every pair of blocks the number of treat- 
ments-in-common is g(r — 1)/(s — 1); then (1.4.7) becomes 


(1.4.8) e(I(s) — s“J(s)], 


where & = qgs(r — 1)/r(s — 1). The & is thus a positive, rational number de- 
termined by the size of the experiment and the type of experimental design. It 





948 8. N. ROY AND WHITFIELD COBB 


is accordingly called the “design constant” for a particular experiment. Designs 
which satisfy all three of these restrictions have been called “linked block” by 
Youden [16]. They include randomized block (for which & = q = t), symmetric 
balanced incomplete block designs, and those partially balanced incomplete 
block designs which are duals of balanced incomplete block designs. Henceforth 
it is to be understood that the experimental design is one of these types which 
satisfy the linked block conditions. 

The underlying motivation for these three restrictions on the design is to re- 
duce (1.4.2) to (1.4.8) or, in other words to make b’H’(T,T;) “Hb, which ap- 
pears in the quasi-confidence statement (1.3.5), proportional to the corrected 
sum of squares of the block effects, 


(1.4.9) > bi — (> b) 


tel t=l 


But when these three restrictions do hold, (1.4.4) can be further simplified to 


(1.4.10) vv = 6 “y{I — r"MoMoJM,K{I + JJK’M,[I — r“M.Mily, 


which requires no matrix inversion whatever. Moreover, when (1.4.8) is set 
equal to (1.4.2), it follows by (A.2) and (A.1) that 


(1.4.11) T, = o'l, 


where the positive root of 8° will make T, agree with the convention of having 
positive diagonal terms. 

1.5. Class marks and “‘substitute variates.”” The model specifies that each com- 
ponent of b is a sample from an unknown distribution that is presumably either 
continu»us or discrete with a sufficient number of values. Some measure of the 
dispersion of that distribution is sought, but the variance, however suitable 
for this purpose in the case of normal distributions, may be inappropriate for 
other distributions. However, any continuous distribution has uniquely deter- 
mined quantiles, for which we introduce the following notation: ,,8, will denote 
the nth m-tile of the distribution of the variate b. Thus 6, denotes the median; 
«6; and ,8; denote the odd quartiles, ete. These quantiles may be used as class 
boundaries and class marks for approximating the unknown continuous distribu- 
tion by a discrete distribution as follows. Suppose the quartiles of b were known: 
Bo, «8:1, Be, Bs, 84. The even quartiles 8 , 8: , and 48, used as class bound- 
aries would lump all possible values of b into two classes. Instead of the mid- 
range of each class, the median of each class could be used as the class mark, and 
a new variate taking on only these discrete class marks as its values would be a 
rough approximation to the original variate b. Since this new variate in a sense 
replaces the original, it will be called the “substitute variate” and will be de- 
noted by ,b if its two possible values are the odd quartiles of b. The same remarks 
would apply roughly for a discrete distribution with a sufficient number of values. 

This example of the quartiles may be extended to sextiles, octiles, --- , or 





MIXED MODEL VARIANCE ANALYSIS: PART I 949 


2k-tiles. The even quantiles—2o , x82, --- , »x8x—could be made class bound- 
arics, and the odd quantiles—»; , 28s, --* , »8e-.—taken as the class marks. 
These k discrete values would thus be the only values of a substitute variate 
denoted by xb. Since these class boundaries are not arbitrarily chosen but are 
the even 2k-tiles of the population of b, each class has the same a priori proba- 
bility, viz., 1/k, regardless of the shape of the density curve of b. Thus the sub- 
stitute variate »» has k equally-probable values no matter what the unknown 
distribution of b. In other words, 


(1.5.1) Priab = uBeaa} = 1/k, n=1,2,---,k. 


Of course we are not interested in the probabilities 1/k, which are known, but in 
the unknown values 42,1. The actual values of the quantiles of b would cer- 
tainly be desirable but seem no more inferable than the actual block effects or 
treatment effects. On the other hand, just as contrasts among fixed effects may 
be estimated and confidence bounds placed on them, so may differences between 
values of the substitute variate be estimated and have confidence bounds placed 
on them. The interquartile range, 8;-48; , is one such difference. And for k an 
integer greater than 2, 


(1.5.2) 2kB2m+1 — 282m—1 » m=1,2,---,k—1, 


constitute a set of k — 1 interquantile differences, which would reveal more 
and more about the distribution of b as k is increased. But it seems reasonable 
and is also easy to check that from s blocks we cannot estimate these k — 1 


differences unless k S s. 

1.6. Bounding loci in the space of 2k-tile differences. The quasi-confidence 
statement (1.3.5) contained a quadratic form in the components of b, a quad- 
ratic form whose matrix was expressible as (1.4.2) or (1.4.6) and then for a 
large class of useful designs was further simplified to (1.4.8). Temporarily ig- 
noring the scalar design constant, 8’, we shall now investigate the quadratic form 
(1.6.1) {ab]’[I(s) — s-'J(s)}[osb]. 

In this expression the s (unknown) values of the original variate b have been 
replaced by the k (unknown) values of the substitute variate »b. For s > k, 
some of these discrete values must occur more than once. Hence we denote by 
8mm, the frequency of the value »f2,.. , form = 1, 2, --- , k, among the s com- 
ponents of »b. Of course >>, * , m1 = 8. Using (A.10), where y; is replaced by 
Bon and 2; by 2», with summation running from n = 1 ton = k, and using 
(A.11), where 2, = 28241 — wSen—1, we can express (1.6.1) as the quadratic 


form 

(1.6.2) [d]’G[xd], 

where the vector 1d has its k — 1 components defined by 

(1.6.3) ud, = Bonar — 2B on—r forn = 1,2,---,k—1, 





950 8. N. ROY AND WHITFIELD COBB 


and where the symmetric matrix G has in its ith row and jth column for i 2 j, 


(1.6.4) Gy = (z tm-1) Z, tm) 


In terms of the original model, the vector »d consists of parameter-like un- 
known constants, viz., interquantile differences of the distribution from which 
the components of b are a sample. The set of integers s; , 8, --* , Sx, here- 
after denoted by s, represents the hypothetical frequencies of the distinct values 
of the substitute variate »b corresponding to the s (different) values of the 
original variate b. But for the purpose of obtaining confidence bounds on the 
components of »d, we shall think of the components of s as given (constants) 
and of the components of »d as (variable) coordinates of a point in a (k — 1)- 
dimensional parameter space. When (1.6.1) is thus regarded as a function of the 
k — 1 components of »d, a function whose coefficients are determined by a 


given partition of s into s(k), (1.6.1) will be denoted by 4,(d). Under this 
interpretation 


(1.6.5) 4,(d) = constant 


determines a locus in the (k — 1)-dimensional space whose coordinates are the 
components of 4d. For a given constant term and all possible partitions of a 
given s, (1.6.5) would determine a discrete family of such loci. The several loci 
in the family may be classified as bounding or not according to whether they 
would inclose a bounded region of points having non-negative coordinates. (In 
all subsequent use of “bounded region” it will be understood to exclude points 
with any negative coordinate. Thus the bounded region has for its boundary not 
only the locus of (1.6.5) but also, for k = 2, the origin; for k = 3, the d; = 0 
and d, = 0 axes; for k = 4, the d, = 0, d, = 0, d; = 0 planes; for k = 5, the 
d, = 0, d, = 0,d,; = 0, d, = 0 hyperplanes.) For a given k, a given s, and given 
positive constant, there is a locus, denoted by A,(d) = constant, such that it 
incloses a region which is the union of the regions inclosed by all the bounding 
loci in the family determined by (1.6.5). This locus is called the outer boundary. 
Similarly there is an inner boundary, denoted by A,(d) = constant, which in- 
closes a region which is the intersection of all regions inclosed by the bounding 
loci. For some values of k, the outer boundary and the inner boundary are them- 
selves members of the family determined by (1.6.5), whereas for other values of 
k, these two boundaries are composites of more than one bounding locus. 

From (1.6.4) and (A.12) it follows that if s,,., ~ 0 for alln = 1,2, ---,k, 
then (1.6.2) is positive definite, and the locus of (1.6.5) would be two points 
if k = 2, an ellipse if k = 3, and ellipsoid if k = 4, ete.,—all bounding loci. 
But if s; = 0 or sy; = 0, the coefficient of 46; or xx: is zero in (1.6.1), and 
hence, by (A.10), 2d: or »d,_, vanishes from the quadratic form (1.6.2). When 
such is the case, the locus of (1.6.5) provides no bound on the corresponding 
coordinate. 

On the other hand, if s,..; = 0 for 1 # m # k, then »2,,, vanishes from 
the quadratic form (1.6.1), and it might seem that both d,, and d,, would 





MIXED MODEL VARIANCE ANALYSIS: PART I 951 


vanish from the quadratic form (1.6.2), so that neither d,, nor d,, would be 
bounded by the locus of (1.6.5). However, if sani * 0 and &., * 0, we know 
by (A.10) that (1.6.1) can be expressed in terms of 


(28s co uf), (Bs che Bs), eal ( 2xBom+s = uBom—s), oe (Boe — Bas), 
and by (1.6.3) these are 


a i ae) ce a 


Note that here the sum ad, + ud», rather than ad,, and »d,, separately, 
appears in (1.6.1). Hence the locus of (1.6.5) is “flattened”’ if »,... = 0 but is 
still bounding. By an obvious extension of this argument, it follows that the 
locus of (1.6.5) will be bounding when s:,,.,; = 0 for more than one value of m, 
provided only that s; ~ 0 and sy_, * 0. This is seen to be plausible even on 
the rough consideration that a bound on ($s: — w6,) necessarily imposes 
bounds on all intermediate 2k-tile differences, and basically this is what is for- 
malized in the above argument. 

From the set of bounding loci for each value of k, an inner and outer boundary 
must be found. As defined above, “inner” and “‘outer” are designations applied 
to loci by virtue of extreme properties of the matrix of coefficients regardless of 
the constant term in the equation. In Appendix B the matrices of the quadratic 
forms A,(d) and A,(d) are derived from the matrix G. For any k, 


2 
z : (So»d,)’ for s odd , 
(1.6.6) A;(d) = 


i (> -ud,)’ for s even ; 


s— 1 


—* Lu da)? + LSS (ods) (su ds) « 
8 nm 


Thus the inner boundary is a point for k = 2, a straight line for k = 3, a plane 

for k = 4, and a hyperplane for k = 5. The outer boundary is a point fork = 2, 

an ellipse for k = 3, an ellipsoid for k = 4, and a hyperellipsoid for k = 5. 
Since, for 2 S k S 8, the k non-negative integers into which s is partitioned 

are hypothetical frequencies of the k equiprobable values »42,-; , each partition 

of s has an a priori probability 

(1.68) Su. 


k’ I] (8en-1)! 


But all that is needed now is the a priori probability, say 7, that a locus ob- 
tained from (1.6.5) be bounding. Since s, ~ 0 and s»_, ~ 0 are necessary and 
sufficient conditions for bounding loci, 


(1.6.7) Ao(d) = 


y¥ = 1— Pris, = 0} — Prisn, = 0} + Pris; = su. = 0} 


(1.6.9) =f1-2°-2°+0=1-2" for k = 2, 
\1 — 2[(k — 1)/ky’ + [(k — 2)/ky fork > 2. 





952 8S. N. ROY AND WHITFIELD COBB 


This + is thus the a priori probability that a point whose Cartesian coordinates 
are the components of d should lie on any one of the bounding loci. Hence the 
probability is not less than 7 that such a point should lie between the inner and 
outer boundaries or on one of these. 

1.7. The final confidence statement. The quasi-confidence statement (1.3.5) 
asserts that with probability 21 — a two computable quantities, [j and 3 , are 
bounds for a certain quadratic form in the unobservable random effects b, which 
form can be reduced, for many designs, to &°b’[I — s“‘J]b. In Section 1.6 it has 
been shown that with the same degree of approximation which results from 
grouping continuous data into k classes, the continuous variate b may be re- 
placed by the k-valued substitute variate »b. Thus, the quasi-confidence bounds 
would apply to &'[xb}’{I — s~‘J)[ub] or the equivalent form, 8°[xd]/G{xd]. With 
a priori probability 27 given by (1.6.9), the points whose coordinates, »d, 
satisfy the equation [d}’G{xd] = c lie in a region bounded by A;(d) = ¢ and 
4.(d) = c. Because of the postulated independence of b and e, we can combine 
this a priori probability statement about A,(d) and A,(d) with the quasi-confi- 
dence statement (1.3.5) to state with confidence coefficient >( 1 — a)y, that 
the k — 1 differences between successive odd 2k-tiles of the distribution of b 
are coordinates of a point lying in the region bounded by #A,(d) = Uf and 
FA(d) =F. 

To make these final confidence bounds more explicit, let us consider in detail 
k = 2andk = 3. Fork = 2, we refer to (1.6.6) and (1.6.7), supposing s even. 
Combining the two equations giving inner and outer boundaries, we obtain a 
confidence interval for the interquartile range: 


(1.7.1) 5 (4/8), < Bs — B: S 5 "[s/(s — 1)}h. 


For k = 3, we refer to (1.6.6) and (1.6.7), again supposing s to be even. This 
time we obtain simultaneous confidence bounds on two interquantile ranges, 
68s — of: and 8; — .8;. If these two parameters are represented respectively 
by the abscissa and ordinate of a point, then the confidence region is given by 


(s — 1)2* + 2ry + (8 — 1) < 38h, 
x+y 2 8"(4/s)'h, 
z20, 
y = 0. 


2. The General Uniresponse Mixed-Model Problem with Two or More 
Factors. 

2.1. Fixed effects in a two-factor mixed model. The previous sections have been 
almost exclusively concerned with the random effects in a uniresponse model 
with one factor represented by fixed effects and one factor by random effects; 
but the fixed effects may be of greater interest to the experimenter. With this 
in mind, we pointed out that the statistic u defined in (1.2.1), is entirely inde- 


(1.7.2) 





MIXED MODEL VARIANCE ANALYSIS: PART 1 953 


pendent of the random effects b (though not of the statistic v) and is quite 
simply related to the fixed effects a. Suppose now it is desired to estimate a 
particular contrast among the fixed effects, say c’a where c’j = 0. Then 


(2.1.1) c’H’Tw 


is an unbiased estimator of c’a. Just as T, did not need to be determined ex- 
plicitly in order to obtain v’v, so here (2.1.1) can be computed without finding 
T» explicitly. Now regardless of whether b (hence also y) is normally distributed, 
u is normal and hence (2.1.1) is also. The variance of (2.1.1) is the same whether 
b consists of fixed effects or random effects. Thus if ¢, is the upper $a point of 
the ¢-distribution with n — r* d.f., we can assert with confidence coefficient 
1 — a that 


omnes c’H’Tw — t,[c’Rew’w/(n — r*)!' < c/a S H/T 
: + t.[c’Rew'w/(n — r*)]', 


where 
(2.1.3) R = [MjM, — M.M,K(K’M,M,K)~K’M,M,)~. 


For neither the point estimate (2.1.1) nor the confidence interval (2.1.2) is any 
further restriction on design necessary. But for some designs, especially ran- 
domized block, R will be somewhat simpler than (2.1.3). 

Suppose now that the hypothesis of equality of all fixed effects a is to be 
tested. Then the test statistic, [u’u/(t — 1)]/[w’w/(n — r*)], has the central 
F distribution if and only if this hypothesis is true, quite irrespective of the 
distribution of b. On the other hand, it may be desired to test a more general 
hypothesis, Ca = 0, where C(g xX ¢) is of rank g = t — 1. The hypothesis is 


untestable if the rank of m »< |i q +n — r*. The hypothesis is said to be 


’ , 
completely testable or weakly testable according as the rank of ly hg ] is 


equal to or greater than n — r* (and less than g + n — r*). In either testable 
case the test statistic, (u’TSHC’[CRC’] CH’ Tw/g)/(w’w/(n — r*)), has the 
central F distribution, with g and n — r* d.f., if and only if Ca = o. But if the 
hypothesis is weakly testable, the test will have the same power for a possibly 
weaker hypothesis as for the hypothesis being tested. 

2.2. A mized model with more than two factors. From the two-factor mixed 
model of Section 1.1, it is an easy extension to a multifactor mixed model in 
which the observed response of each experimental unit is the sum of one (fixed) 
treatment effect, m (random) block effects, and a normally distributed error. 
Instead of a random sample of size s from a single distribution of block effects, 
we now postulate m distributions of block effects, mutually independent and 
independent of the normal error. The formal relation of the observations y to 
these unobservables of the model may still be expressed as in (1.1.1), but now 





954 S. N. ROY AND WHITFIELD COBB 


the structure matrix has m + 1 submatrices instead of just two, each consist- 
ing of all zero elements except for a single unity in each row. Thus 


M(n X (8 + t)) = [Mo(n x t), Mi(n X gi), Ma(n X ge), --- , Man(n X gm)] 
and 


b’(s) = [bi(gi), ba(ga), --- , Da(gm)], 


where b,; is a sample of size g; from the ith distribution of block effects. 

Although this is a stronger restriction than is necessary, when m > 1 we have 
restricted our investigation to orthogonal designs of two well-known types— 
the complete factorial type in which n = t-gi-g2 --- gm and the Latin Square 
type in which t = g, = g: = --- = g, and n = ¢. For both types the struc- 
ture matrix will have a basis M;(n x (s +¢— m)) such that M;M, and 
(M;M,]~ are formally expressible in terms of r, t, g: , --* , gm . Then premulti- 
plying [M;M,| "Muy by 


' 
[rn —-1)xt4+ (“‘) H((g: — 1) X gi) K(gi & (gi: — 1)) + 


4 ”~ 
cont ) H((gm — 1) X gm) K(gm X (gm — 1)) | 


effects an orthonormal transformation on the observations y and defines m+ 1 
vectors of statistics, which vectors are mutually independent, each vector in- 
volving the effects of only one of the m + 1 factors and, of course, the normal 
error. Thus analogous to (1.2.9) we now have 


u(t — 1) = rH((t — 1) x t)a(t) + Le, 
vilg: — 1) = &H((g: — 1) K gi)bi(gi) + Lie, 


Vn(Gm — 1) = bmEL( (Gm — 1) X gm)Dm(Gm) + Une, 


where 8; = rt/g; and L,((g; — 1) X n) is orthonormal. There is also a sta- 
tistic w(n — r*), relevant to error only, defined as in (1.2.7) except that r*, 
the rank of the structure matrix, is now s + t — m. Each v; can be used, just 
as v was used in Section 1.3 and Section 1.7, with w to obtain quasi-confidence 
bounds and then final confidence bounds on the 2k-tile differences of each ran- 
dom-effects factor. Analogous extension to a model with more than one fixed- 
effects factor is also possible. 


APPENDIX A 
Miscellaneous Lemmas 
Proofs will not be given where direct verification is straightforward. 


(A.1) H((m — 1) X m)j(m) = o(m — 1). 
(A.2) H((m — 1) XK m)H’(m x (m — 1)) = I(m — 1). 





MIXED MODEL VARIANCE ANALYSIS: PART I 955 


(A.3) H’(m XK (m — 1))H((m — 1) KX m) = I(m) — m“J(m). 

(A.4) Postmultiplying A(m x n) by K(n x (mn — 1)) removes the nth col- 
umn of A. 
Premultiplying A(m x n) by K’((m — 1) X m) removes the mth row 
of A. 

(A.5) Postmultiplying A(m x n) by K’(n xk (n + 1)) adjoins o(m) as an 
(n + 1)th column. 
Premultiplying A(m x n) by K((m +1) X m) adjoins o’(n) as an 
(m + 1)th row. 

(A.6) Any matrix A can be partitioned into [AK, (A — AKK’)j]. 

(A.7) If Aj = 0, A = [AK, —AKjj. 

(A.8) If Cj = o where C(m) is symmetric and of rank m — 1, then 
C = H’{HK[K’CK]"'K’H’} “H = H’(K’H’)'K’CK(HK)“H. 

(A.9) If A(m) and D(n) are both nonsingular, 


i «f | (A- BD'C)" (BD'C— rn 
c,D (CA"B-D)"ca* (D-—CA'B)"* J 


This is stated as an exercise in [1]. It may be derived by solving four simul- 
taneous matrix equations or simply verified by postmultiplying the two mem- 


bers by ie | and then factoring each of the four combinations so obtained. 


(A.10) L x0 - (= rar) /& = > > rays — wi /% %. 


t—2 
Proor: For }-7.: 2; #0, the above is equivalent to 
™ m ™ 2 m il 
(= n\(¥ ry) = (= ys) + 2 p> zajlys — y;)’, 


whose right member can be rearranged as follows 

(= za) + > > wen(yis — 2yy; + yj) 
= (= ra) + .¢. (1 — &)xayj — >> > (1 — by) aay; 
= (= ra) + . X p> zy} — > zis — > iyi s ziyy + > zivi 
=) 12 tj, 


t=1 j=l 
which is obviously the same as the left member. (The Kronecker delta 5,; = 1 
if i = j and 0 otherwise. ) 
(A.11) St. Sixt za,( D5 z,)* is a quadratic form in 2, --- , zm-1 whose 
(symmetric) matrix has, for p 2 q, (> f-1 2«)( 7~»+12,) as the element in 
the pth row and gth column. 





956 8. N. ROY AND WHITFIELD COBB 


Proor: The (7) terms of the given double sum can be arranged in a triangu- 


lar array, and for a given n, 1 S$ n S m — 1, 2, will occur in a rectangular 
subarray of the first n columns (say) and nth through (m — 1)th rows. The 
sum of these n(m — n) terms is 


= Ti+ p> ijt, = (= x) aM x) z.. 


Also from the same triangular array it is apparent that for a given n and l, 
1snslsm -—1, the factor 2z,z; occurs in n(m — l) terms in the first n 
columns and the /th through (m — 1)th rows. The sum of these terms is 


’ i So » Tae ( > a), x) _ 


These same two sums would be obtained from the diagonal and the off-diagonal 
elements respectively of the quadratic form whose matrix is defined above. 
(A.12) If >>: % > 0 and no x < 0, the real symmetric (m — 1)th order 
matrix whose element in the pth row and gth column, p 2 gq, is (> $1 2) x 
(>-2»+1 2x) is at least positive semidefinite with its vacuity equal to the num- 
ber of values of k for which xy = 0. 

Proor: x ~ 0 for at least one k. Without loss of generality suppose x, ~ 0 
fork = 1, ---,n and x = 0 fork = n+1,---,m. Adding (> rsa: 2%) /21 
times the first column to the jth column makes "7; z, a common factor of all 
elements in the jth column for j = 2, 3, --- , m — 1. Then subtracting the jth 
row from the (j — 1)th row for 7 = 2, 3, --- , m — 1 leaves only zeros above 
the principal diagonal. The elements in this diagonal are 2,2, in the first row and 
(>of. ze)2;41 in the jth row for j = 2, 3, --- ,m — 1. Considering the se- 
quence of lower triangular submatrices consisting of the first k rows, for k = 
2, -++,n — 1, each is nonsingular with determinant equal to 


For k 2 n, each is singular. It follows from this and Gundelfinger’s rule [2] that 
the original matrix has n — 1 positive characteristic roots and m — n zeros. 


APPENDIX B 


Inner and Outer Boundaries 


Either to pick out the outer boundary from the family of loci given by (1.6.5) 
or, failing that, to construct an outer boundary for the family, we consider the 
matrix G defined by (1.6.4) with the restriction that s, =~ 0 and sy, = 0. If 
giz denotes the element in the matrix of the outer boundary, corresponding to 
G;; in (1.6.4), then, since »#d2,, 2 0 for all n, it is sufficient that 


(B.1) 93 & Gi; for alli, 7 = ll---,k— 1. 





MIXED MODEL VARIANCE ANALYSIS: PART I 


From (B.1) and (1.6.4) it follows that 
(s — 1)/s fori = j, 
(B.2) gi = 
fori # j. 
For k = 2,3, 5 the outer boundary thus determined belongs to the family 
(1.6.5); for k = 4, it does not. 
On the other hand, to pick out the inner boundary from the family of loci, we 


want the largest possible G;, for all i, 7. Inspection of (1.6.4) leads to the con- 
clusion that for the inner boundary 


(8° — 1)/4s for s odd, 


(B.3) Gi; = 
3/4 for s even, 


for all 2, 7. Thus it appears that for all values of k, the inner boundary is that 
member of the family of (1.6.5) which is completely flat, i.e., with only one non- 
vanishing characteristic root. 


REFERENCES 


{1] Arrxen, A. C., Determinants and Matrices, 8th ed., Oliver and Boyd, Edinburgh, 
1954. 
[2] Browne, E. T., Introduction to the Theory of Determinants and Matrices, University 
of North Carolina Press, Chapel Hill, c. 1958. 
[3] Guosu, M. N., “Simultaneous tests of linear hypotheses,”’ Biometrika, Vol. 42 (1955), 
pp. 441-449. 
[4] Heck, D. L., “Some Uses of the Distribution of the Largest Root in Multivariate 
Analysis.”’ Institute of Statistics, University of North Carolina, Mimeograph 
Series No. 194, 1958. 
[5] MacDurresg, C. C., The Theory of Matrices. J. Springer, Berlin, 1933. 
[6] RamacuanprRaN, K. V., “On the simultaneous analysis of variance test,’’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 521-528. 
[7] Ramacuanpran, K. V., “Contribution to simultaneous confidence interval estima- 
tion,’’ Biometrics, Vol. 12 (1956), pp. 51-56. 
[8] Roy, 8. N. anp GNANADESIKAN, R., ‘Further contributions to multivariate confidence 
bounds,’’ Biometrika, Vol. 44 (1957), pp. 399-410. 
{9} Ror, 8. N. anp GnanapzsIKaN, R., “Some contributions to ANOVA in one or more 
dimensions: I,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 304-317. 
[10] Roy, 8. N. anp GNANADESIKAN, R., ‘‘Some contributions to ANOVA in one or more 
dimensions: II,” Ann. Math. Stat., Vol. 30 (1959), pp. 318-340. 
{1l] Roy, 8. N. anp Saruan, A. E., “On inverting a class of patterned matrices,”” Bio- 
metrika, Vol. 43 (1956), pp. 227-231. 
[12] Ror, 8. N., Some Aspects of Multivariate Analysis, John Wiley and Sons, New York, 
1957. 


(13) Tompson, W. A. Jn., “On the ratio of variances in the mixed incomplete block model,” 
Ann. Math. Stat., Vol. 26 (1955), pp. 721-733. 

[14] Tuompson, W. A. Jr., “The ratio of variances in a variance components model,’’ Ann. 
Math. Stat., Vol. 26 (1955), pp. 325-329. 

(15) Witxs, 8. 8., Mathematical Statistics, Princeton University Press, Princeton, 1943. 

(16] Youpgen, W. J., “Linked blocks: a new class of incomplete block designs” (abstract), 
Biometrics, Vol. 7 (1951), p. 124. 





MIXED MODEL VARIANCE ANALYSIS WITH NORMAL ERROR AND 
POSSIBLY NON-NORMAL OTHER RANDOM EFFECTS: PART II: 
THE MULTIVARIATE CASE’ 


By S. N. Roy anp Wuirrietp Coss 
University of North Carolina 


3.0. Introduction and summary. The present paper is a continuation of another 
with similar title and the same overall objectives—confidence bounds on appro- 
priate measures of the dispersion of the distribution from which random (block) 
effects are drawn in an experiment where fixed (treatment) effects are also under 
investigation. Specifically, confidence bounds are obtained on the maximum and 
minimum characteristic roots of the variance matrix of the block effects when the 
latter are assumed to come from a p-variate normal distribution (without the 
assumption made in [1], [5] that this variance matrix is proportional to that of 
the error). When the random block effects are not assumed to be normal, con- 
sideration is given to the approximation of an unknown multivariate distribution 
by means of marginal and conditional quantiles. Then for a rather restricted 
bivariate case, simultaneous confidence bounds are found for the two inter- 
quartile ranges. 

Since the ideas and notation of the first paper are presupposed by this one, 
much duplication is avoided by reference to appropriate sections or steps in the 
previous article. To facilitate such reference, the numbering is consecutive 
through both parts. 


3.1. The multiresponse model and its statistics. In the previous sections we 
have considered models in which the observed response was regarded as the sum 
of a normally distributed error and two or more effects due to treatments and 
blocks, but only one type of observation was to be made on each experimental 
unit. Now suppose that these several factors—whether called treatments or 
blocks and whether represented by fixed effects or random effects—are regarded 
as influencing more than one observable characteristic of the experimental units. 
This is the situation sometimes called a multivariate analysis of variance model, 
but perhaps a better expression would be “‘multiresponse analysis of variance 
model.’’ We shall suppose that the response is determined, in a particular experi- 
ment, by observing p distinct (but presumably related) characteristics of each 
experimental unit. Such observations are conveniently arranged as elements of 
the matrix Y (n X p). The experiment might indicate that the observations on 
one or more of these p characteristics provided no additional information, in 
which case such characteristics could be dropped from the model. But regardless 
of how many or which characteristics are to be observed, the structure matrix, 

Received August 10, 1959; revised May 25, 1960. 


1 This research was sponsored by the United States Air Force through the Office of 
Scientific Research of the Air Research and Development Command. 


958 





MIXED MODEL VARIANCE ANALYSIS: PART II 959 


as made specific by a chosen design and field plan, tells how the n experimental 
units are distributed among the cells of the cross classification, i.e., among the 
s blocks and ¢ treatments. The p columns of Y are related to corresponding col- 
umns of postulated treatment-effects and block-effects matrices by the same 
structure matrix, since the different characteristics observed belong to the same 
experimental units. Thus analogous to (1.1.1), we have for a two-factor multi- 
response model 


(3.1.1) Y¥(n Xp) = M(n x (s +2)) pos ze a + E(n X p), 

where the structure matrix M is exactly the same as in Section 1.1. The ele- 
ments of E are assumed to be normally distributed and of the nature of errors. 
As with the components of e in Section 1.1, the rows of E are assumed to be 
uncorrelated, but we do allow and expect correlation between elements in the 
same row. We further postulate that every row of E have a common variance 
matrix, £(p). Then the ith column of E has as its variance matrix o,1(n), 
where o;; is the ith element in the principal diagonal of Z(p). As in the uni- 
response model already considered, we regard the treatment effects A(t x p) 
as fixed and the block effects B(s x p) as random, but not necessarily normal, 
and independent of E. 

Because the structure matrix M of (3.1.1) is the same as the structure matrix 
M of (1.1.2), the motivation and the actual derivation presented in Section 1.2 
are just as relevant here. 

The same three orthonormal matrices L, , L, , and L, (we recall that LoL, = 
I(t—1), Lili =1(s—1), LL’=I(n—r*), LoL’ = 0, LL’ = 0, but 
LoL, ~ 0 in general) are now used for defining three matrices of statistics in 
terms of the matrix of multiresponse observations: 


U((t— 1) K p) = Io((t — 1) K n)¥(n x p), 
(3.1.2) Vi(s — 1) X p) = Li((s — 1) K n)¥(n x p), 
W((n — r*) X p) = Li(n — r*) K n)¥(n X p). 


The immediate consequence of these definitions is the desired relation of sta- 
tistics to unobservables of the model: 


U = T, 'HA+ LE, 
(3.1.3) V = T; HB+ LE, 
W=LE. 


Moreover, the same interrelations among the vectors of (1.2.9) are preserved 
among the matrices of (3.1.3): U is independent of B though not of V; V is 
independent of A though not of U; W is independent of A, B, U, and V. But 
(3.1.3) has another feature which may be worth pointing out: (1) each column 
of U depends upon only the corresponding column of the treatment effects and 





960 8S. N. ROY AND WHITFIELD COBB 


the (normal) errors associated with observing that characteristic of the experi- 
mental unit; (2) each column of V depends upon only the corresponding column 
of the block effects and the (normal) errors associated with observing that 
characteristic; (3) each column of W depends upon only the column of (normal) 
errors associated with observing that characteristic. Here also we require that 
the experimental design satisfy the linked block conditions which, in Section 1.4, 
were shown to be sufficient to reduce T, to the scalar matrix "I. 


3.2. Quasi-confidence bounds. In obtaining the quasi-confidence bounds 


for the uniresponse situation, we wanted at orthonormal in order to preserve 


for the s — 1 + n — r* components of ‘3 the same independent normal 


distribution postulated for e. Similarly here a consequence of the orthonormality 
of Fy is that | consists of s — 1 + n — r* mutually independent rows, 


each row having the same p-variate normal distribution postulated for every 
row of E. One way of demonstrating this is given in Appendix C. Thus it follows 
that 6{E’L,{L,E/(s — 1)} = &{W’W/(n — r*)} = &{E’E/n}. Whereas w'w 
was a positive scalar and w’w/o’ had the central chi square distribution, WW 
is (a.e.) a positive definite symmetric matrix and W’W/(n — r*) has the central 
Wishart distribution with n — r* d.f., provided n — r* 2 p. If s — 12 p, 
similar statements can be made about E’L,LE. Moreover, when the above 
conditions are satisfied, the distribution of the characteristic roots of 
E’LiL,E(W’W)~ is known (cf. pp. 34-35 of [6]) and known to depend only 
upon the constants s — 1, n — r*, and p. The distribution of the maximum of 
such characteristic roots is now tabulated (cf. [2], [3], [4]). Hence for a chosen 
a < 1 it is possible to find a constant c, such that 


(3.2.1) Pr{chmax [E’L: 4E(W’W)'] S ca} = 1 — a. 


This probability statement corresponds to (1.3.1) for the univariate case. It is 
true regardless of the computed values of V and regardless of the distribution 
from which B is a sample. 


Using (3.1.2) to eliminate LE, the inequality within (3.2.1) becomes 
(3.2.2) chmax [(V — T;'HB)'(V — T;'HB)(W’W)"] S ce. 


Since for positive definite M and at least positive semidefinite N, chuin(M) 


‘Chmin (N) S ch(MN) S ch max (M) chaz (N) (cf. A.1.22 of [6]), (3.2.2) 
implies 


(3.2.3) Chinax ((V — Ty'HB)’'(V — Ty’HB)] S ce Chinas (W'W). 
Then by Lemma 1.2e of [1], (3.2.3) is equivalent to 
(3.2.4) | d’(V — Ty'HB)e| S [ca chmax (W'W)}' 





MIXED MODEL VARIANCE ANALYSIS: PART II 


for all unit vectors d and e. Whence 
(3.2.5) d’Ve — [ca chuaax (W’W)|' < d’T;HBe < d’Ve + [c. chusx (W’W)}' 


for all unit vectors d and e. It is easily seen that d’Ve s sup (d’Ve) for all 
unit vectors d and e including that pair which maximizes d’T‘HBe. Similarly 
inf (d’Ve) < d’Ve for all unit vectors d and e including that pair which mini- 
mizes d’T; HBe. Applying these arguments to the right and left inequalities, 
respectively, of (3.2.5) yields 


inf (d’Ve) — [cachmax(W’W)]’ < inf (d’T;'HBe) 
< sup (d’T;"HBe) < sup (d’Ve) + [cechmax( WW)’, 
which is equivalent to 
[chmin( V’V)]’ — [cachmax(W’W)]' S [chwinB’H’(T,T;) HB} 
S [chnsxB’H’(T:T;) HB)’ < [chmex(V’V)]' + [cechmax(W'W)]. 


This inequality, being implied by but not implying the inequality within (3.2.1), 
would be true with probability not less than 1 — a. (The same remarks made 
at the end of Section 1.3 would apply here.) Thus (3.2.6) is the multiresponse 
analog of (1.3.3). For the large class of designs for which T,T; = 5”*I, the two 
central members of (3.2.6) may be simplified. And because of the non-negative 
character of these two, the extreme left member may be replaced by a non- 
negative upper bound. Thus if we define 


(3.2.6) 


lL, = (87 chuin( V’V)]' — [8 %ca chmax(W'W)]’, when this is >0, 
(3.2.7) l, = 0 otherwise, 

ly = [6* climax( V’V))’ + [8 *ca Clmax(W'W)], 
then we have 


(3.2.8) Pr{lj < chuin(B’H’HB) < ch,,.(B’H’HB) < lj} = 1 — a. 


This (3.2.8) is in the form of a confidence statement. The bounds, [{ and &, 
are computable from the observations, Y, and a chosen confidence coefficient, 
1 — a. But the central terms are not explicit parameters or even parametric 
functions. Hence we call (3.2.8) a quasi-confidence statement. In so far as 
(3.2.8) is an intermediate step toward confidence bounds on certain parametric 
functions, it may also be called a preliminary confidence statement. 


3.3. Multivariate normal random effects. It has already been stated that the 
rows of B are independently but identically distributed. Suppose now it is 
further specified that this common distribution be a p-variate normal with 
(unknown) variance matrix denoted by %,(p). Then regardless of 6(B), HB 
has zero expectation, and &(B’H’HB) = (s — 1), . Moreover the distribution 
of ch(B’H’BH;') is known to depend only on the constants s — 1 and p. 





962 S. N. ROY AND WHITFIELD COBB 


Thus for a chosen a; < 1 we can find two constants c,; and c2 such that 


(3.3.1) Price, < chmia(B’H’HB;') < ch,.x(B’H’HB;') < c} = 1 — a. 


Since the non-zero roots, ch(M,M>'), are the stationary values of e’M,e/e’M.ze, 
(A.2.1 of [6]), chmsx(B’H’HB2;') < c is equivalent to e’B’H’HBe/c, < e’X:e 
for all vectors e. Moreover inf(e’B’H’HBe) <s e’B’H’HBe for all e including 
that choice which minimizes e’Z,e. Thus chain(B’H’HB)/cz S chmin( 1), and 
similar argument leads to chmsx(%:) S chmsx(B’H’HB)/c; . Thus the proba- 
bility statement (3.3.1) leads to 


(3.3.2) Pr{chwin(B’H’HB) /c. s chmin( X1) Ss Chmax( 21) s Chimax(B’H’HB) /c;} 


= a, 
which is in the form of a confidence statement. What keeps (3.3.2) from being 
a bona fide confidence statement is that its bounds are not actually computable 


from observations. However, we do have bounds (quasi-confidence bounds) on 
these bounds. Combining (3.2.8) and (3.3.2) gives 


(3.3.3) Pr{li/e: S chmin( Zi) S Chmax(2Zi) S G/es} => (1 — a)(1 — a). 


Because of the postulated independence of E and B, the respective confidence 
coefficients of (3.2.8) and (3.3.2) may be multiplied as in (3.3.3). 

Confidence bounds on ch( 2%) and on oj where it was assumed that &, = oj 
have been obtained previously [1], [5]. But (3.3.3) requires no such restrictive 
assumption. Of course the characteristic roots of a variance matrix are not in 
themselves easily interpreted parameters like standard deviations of the several 
variates, but they do constitute a measure of dispersion. 


3.4. Marginal and conditional m-tiles. In Section 1 we proposed to replace the 
unobservable nonnormal (or not-necessarily-normal) block effects variate by a 
substitute variate taking k unknown values with equal probabilities. We then 
proceeded to obtain simultaneous confidence bounds on the differences between 
these successive unknown values and to interpret these k — 1 differences as 
estimates of the differences between the successive odd 2k-tiles of the population 
of block effects. In the present section we propose a similar procedure for the 
multiresponse mixed model. In a practical approach to a multiresponse experi- 
ment, the p characteristics to be observed are apt to be selected, one at a time, 
in the order of their interest or presumed relevance to the factors being studied. 
Accordingly we adopt the convention that in all multiresponse models, the sub- 
scripts 1, 2, --- , p on column vectors from Y, A, B, E, U, V, or W will indicate 
respectively the most important, the next most important, --- , the least im- 
portant characteristic. Consistent with the notation introduced in Section 1.5, 
in a multiresponse model ,,8,, will denote the mth m-tile of the marginal distri- 
bution of the first variate or first characteristic of the block-effects factor. The 
symbol »8x,n, Will denote the nth m-tile of the conditional distribution of the 
second variate or second characteristic given that the first variate lies below the 





MIXED MODEL VARIANCE ANALYSIS: PART II 963 


(m + 1)th and not below the (nm, — 1)th m-tile of its marginal distribution. 
Similarly .8n,..., Will denote the nsth m-tile of the conditional distribution of 
the third variate given that the first variate lies below the (m + 1)th m-tile 
and not below the (nm — 1)th m-tile of its marginal distribution and given that 
the second variate lies below the (mz. + 1)th m-tile and not below the (m — 1)th 
m-tile of its conditional distribution, etc. As in Section 1.5 we consider only even 
values of m, say 2k, and only odd values for n;. Using 6; for any element in 
the ith column of B(s x p), then no matter what the distribution of b, 


Pr{aBe,-1 S bi < uBx,4} = 1/k, 

Pr{sBnyng—1 SZ b2 < aeBnyngtt | Bm S di < aBay 4s} = 1/k, 

Pr{2xBaynans—t SOs < wBnyngnett | e8m-1 S bi < why and 
2Bnyng—1 S b2 < eBryngsa} = 1/k; etc. 

Combining p such probability statement yields 


(3.4.1) 


Pr{ a8n,—1 Ss b < ony +1 ; 2B nj ng—1 s be < Bn, ng +1 cee s 
wBag--.nyg-t & bp < wBn--ngrt} = 1/2. 


In Section 1.5 the presumably continuous but unknown distribution of the 
random block effects was approximated by the discrete distribution whose 
equally probable values were the odd 2k-tiles of the unknown distribution. It 
was as if the data were classified into k classes with class boundaries at the even 
2k-tiles and with the odd 2k-tiles used as class marks. This scheme may be ex- 
tended to bivariate and even p-variate distributions. The data of a bivariate 
distribution may be classified into k’ classes where the class boundaries are the 
even 2k-tiles of the marginal distribution of the first variate and the conditional 
distributions of the second variate given the class of the first variate. E.g., there 
might be four classes defined as follows: (1) @. S bi < B:, Bo S be < Bu; 
(2) Bo Shi < &, Bo S be < Bu; (3) & Shi < Bi, Bo S be < Bn; (4) 
Be S by < Bi, Ba S be < Bu. As class marks for these classes we would take 
the following pairs: (1) b = fi » bo = Su ; (2) bh = fi 9 bs = fu ; (3) bh = 
Bs, bs = Bu; (4) bi = Bs, be = Ba. A bivariate which takes on just these 
four pairs of values we denote by (48; , 42), and when it is used in place of the 
presumably continuous bivariate (b, , b:), we call the former the “substitute 
variate”. Similarly (ab: , xb.) will denote a substitute variate which takes on 
k* equally probable pairs of values, viz., 


(3.4.3) Prd: = Bema j xd: = 2Bom—i20a} = 1/K 


form = 1,2,---,kandn = 1,2,---,k. For a trivariate situation the distri- 
bution of the substitute variate would be defined by 


(3.4.4) Prfasbs = 2eBom—a; ud: = Bem—t.o0-1j 00s = 24Brm-1,20-1,2¢-1} = 1/k’, 


(3.4.2) 





964 8. N. ROY AND WHITFIELD COBB 


for m,n, q = 1, 2,--- , k. The extension to a larger number of characteristics 
is now obvious and need not be written. 

These marginal and conditional m-tiles provide a means of distinguishing 
different kinds of interrelatedness or dependence in a multivariate distribution. 
For simplicity of discussion, we shall consider p = 2 and k = 2. Then it is 
possible to classify bivariate distributions into four types, the first being the 
most general and including all distributions not qualifying for the other types. 
We call type 2 those distributions in which conditional distributions of the sec- 
ond variate, given different values of the first variate, all have the same disper- 
sion as measured by the interquartile difference; but the variates are ‘““dependent”’ 
in the sense (i) that medians of the conditional distributions of the second 
variate are different for different values of the first variate. In type 3, on the 
other hand, conditional distributions of the second variate, given different values 
of the first variate, all have the same median; but the variates are “dependent” 
in the sense (ii) that conditional distributions of the second variate have dif- 
ferent dispersions as measured by the interquartile difference. In type 4 the 
variates are not ‘‘dependent”’ in either sense (i) or sense (ii). A bivariate normal 
of type 4 would consist of two independent normal distributions. 


3.5. Confidence bounds in a simple bivariate case. In Section 3.2 we obtained 
preliminary or quasi-confidence bounds on ch(B’H’HB) regardless of the dis- 
tribution of B, and then in Section 3.3 we used these quasi-confidence bounds 
as @ preliminary stage in finding genuine confidence bounds on ch( %,) when B 
was assumed to be normal, each row of B having &;, as its variance matrix. But 
if B is not normal, or not assumed to be normal prior to the experiment, is it 
possible to use the quasi-confidence bounds to obtain a confidence statement 
about the marginal and conditional m-tiles described in Section 3.4? The answer 
is yes. At least for the simplest multiresponse model—p = 2,k = 2, type 4— 
perhaps also for others, it is feasible as well as possible. 

For p = 2, the quadratic formula may be used to find explicitly 

chain(B’H’HB) = A, and chasx(B’H’HB) = 2.: 
cece }(biH’Hb, + b:H’Hb,) 

ae + }{(bjH’Hb, — b;H’Hb,)* + 4(bjH’Hb,)*)’. 


Next we set the two d; of (3.5.1) equal respectively to of (3.2.7) to obtain 
the two extreme conditions permitted by the quasi-confidence statement (3.2.8). 
Algebraic simplification results in 


(b;H’Hb.)* — (biH’Hb,) (bsH’Hb.) 
+ 2(b,H’Hb, + b:H’Hb,) — ff = 0. 


For type 4 there are no regression-like parameters to be found—only the sepa- 
rate measures of dispersion. For k = 2 there are merely 8; — 8; and 8 — §u 
since .8n = .8y and 48:3 = 8:;. Now replacing the unknown variate (b, , b:) 


(3.5.2) 





MIXED MODEL VARIANCE ANALYSIS: PART Ll 965 


by the substitute variate (db; , :) means that the s rows of B are replaced by 
so many, say 8, , rows equal to (,, #u), 80 Many, say 8, rows equal to 
(wi , is), etc., where 8, + 82 + 81 + 83 = 8. When these replacements are 
made in the quadratic and bilinear forms occurring in (3.5.2), Lemma A.10 
may be used, and (3.5.2) becomes 


8 “[(8u + 8s) (8m + S22) (Su + 81) (8s + 8) — (88s — S108m)'] 
(3.5.3) X (Bs — Bi)*(Bus — Bu)’ — Ks '[(u + fi) (8m + 82) (bs — i)” 
+ (81 + 8m) (Sis + 8s) (Gis — Bu)*| + = 0. 


For a given partition of s and for a given J,, (3.5.3) may be regarded as 
determining a locus in the plane of (4%: — #:)* and (un — @u)*. Note that 
the coordinates to be used are the squares of these interquartile differences. All 
possible partitions of s thus determine a finite family of conics for each value 
of i;. As in Section 1.6 some of these loci are called bounding because they 
(together with the two coordinate axes) would inclose a region of the first 
quadrant. If s is so partitioned that any three components are zero, (3.5.3) 
becomes a contradiction. If 8, and 8, or 8, and 8, are zero and no other com- 
ponent is, the locus of (3.5.3) is a line parallel to one coordinate axis. These 
are the cases in which (3.5.3) does not correspond to bounding loci. But if sy 
and 8% OF 8, and s, are zero and no other component is, the locus of (3.5.3) 
is a line through the first quadrant making equal intercepts. If 6:55 = S28 ~ 0, 
the locus of (3.5.3) is two lines parallel to the coordinate axes and intersecting 
in the first quadrant. For all other possible partitions of s, the locus of (3.5.3) 
is a rectangular hyperbola whose asymptotes are parallel to the coordinate axes 
and whose center is in the first quadrant. The inner boundary of this set of 
bounding loci is itself a locus of (3.5.3), viz., the straight line whose equation is 


(Bs — Bs)? + (Bis — Bu)* = 4eli/(8? — 1) or 
(8s — i)” + (Bis — Bu)’ = 41i/s, 


depending upon whether s is odd or even. But there is no unique outer boundary 
among the bounding loci of (3.5.3). By comparing the various types of bounding 
loci enumerated above, it is easy to pick out the four segments of three members 
of the family (3.5.3) which constitute the outer boundary. Thus the confidence 
region is given by 
{ (i) (Bs — Bi)’ + (Bis — Bu)’ S 8la/(8 — 1) 
for 05 (8:— ¢:)* S 8(s — 3)L/2(8 — 1)(8 — 2) 
and for sl3/2(s — 2) S (&s — dh)’ S 84/(s — 1), 
(3.5.5) 2 (i) 0 S (bu — Bu)’ S #h/2(s — 2) for is jy 
- | 8(3 — 3)L,/2(8s — 1)(8 — 2) S& (Bs — Hi) & 4he/2(8 — 2); 
\(ili) (Bs — i)” + (Bs — Bu)* 2 4sli/(s' — 1) for 8 odd, 


(3.5.4) 


or 
liv) (Bs — Bi)” + (Bis — Bu)’ 2 Ali/s for s even, 





966 Ss. N. ROY AND WHITFIELD COBB 


where (i) and (ii) exhibit the outer boundary, (iii) or (iv), the inner bound- 
ary. 
Associated with each partition of s is the a priori probability 


| 8! 
_— Pau leialoalen! 
In the present case the total a priori probability associated with all bounding 
loci is easily found by subtracting from unity the probability of the non-bounding 
loci: 

, ci A ae AE iE 5 alae 
(3.5.7) y¥=1 ar m7 at 1-2”. 
It is interesting to note that here for p = 2 and k = 2 (where there are 4 equi- 
probable values) 7 is exactly the same as in the uniresponse case with k = 2 
(where there were 2 equiprobable values). 

Now since B and E are independent by hypothesis, we may multiply the 
1 — a of the quasi-confidence statement (3.2.8) by the y of (3.5.7) to obtain 
a lower bound on the final confidence coefficient. The final confidence statement 
says that with probability 2(1 — a)y, the interquartile differences 8, — 6; 
and 48:3; — 8 are positive square roots of the coordinates of some point lying 
in the first quadrant region defined by (3.5.5) and the (48; — #:)* = 0 and 
(812 — Bu)’ = O axes. Of course here, as in the uniresponse model, there is 
an element of approximation due to replacing the unknown variate by the 
k’-valued substitute variate. But presumably here too the degree of approxima- 
tion can be improved by increasing k. 


APPENDIX C 
Variances and Covariances for a Matrix of Normal Variates 
It is a well established custom to exhibit the n variances and (>) covariances 


of a set of n normal variates as elements of an n X n matrix, displaying each 
covariance twice. Thus we say a stochastic vector has a variance-covariance 
matrix. If the elements of an m x n matrix were first written as components 


of a single vector, then the mn variances and ing covariances of those elements 


could be displayed as elements of an mn X mn symmetric matrix. But it would 
certainly be desirable to arrange these variances and covariances in such a way 
that properties of the rows (or columns) of the original matrix are readily ap- 
parent from this larger matrix. To facilitate this systematization we define 
ad hoc a special vector and list its properties. 

(C.1) h’(4) = (1, 0, 9, 1), h’(9) = (1, 0, 0, 0, 1, 0, 0, 0, 1), and for 
any positive integer m, h(m’) will denote a column vector with m’ components, 
m of which (including the first and last components) are unity with m zeros 
between successive unities. 





MIXED MODEL VARIANCE ANALYSIS: PART II 967 


(C.2) h’(m’)h(m’) = h’(m’)j(m’) = m. 

(C.3) h(m)h’(m) = h(m)-X h’(m’), where- X indicates the left direct product 
as in Section 0.2. 

(C.4) [A(m x n)-XI(n)]h(n?) = a(mn), where the m elements in the jth 
column of A become respectively the (jm — m + 1)th through ymth components 
of a forj = 1,2,---,n. 

(C.5) [I(m)- Xh’(n’)}[a(mn)- X1I(n)] = A(m X n), where the (jm — m + 1)th 
through jmth components of a become respectively the elements of the jth 
column of Aforj = 1,2,---,nm  _ 

Now starting with a matrix A(m xX n) of normal variates we collapse it into 
the vector a(mn), defined as in (C.4), which separates the elements in the same 
row of A but keeps consecutive the elements in the same column. Hence the 
covariance matrix Z(mn) will have the following pattern. The variances of the 
consecutive elements in the jth column of A will be consecutive elements along 
the principal diagonal of & from the (jm — m + 1)th through the jmth row 


for j = 1, 2,---,n. The (4) covariances of elements in the jth column of A 


will appear (twice) as the nondiagonal elements of the m x m principal sub- 
matrix in the (jm — m + 1)th through the jmth rows of £. The ~ covariances 


of the ith and jth elements within the kth row will appear (twice) as the kth 
diagonal element in the m x m submatrix lying in the (tm — m + 1)th through 
imth rows (columns) and (jm — m + 1)th through jmth columns (rows). 
The mn(m — 1)(n — 1)/2 covariances of elements not in either the same row 
or same column of A will appear (twice) as the nondiagonal elements of these 
nonprincipal submatrices. 

(C.6) If and only if each column of A has the identical variance-covariance 
matrix £.(m) and the n columns are independent, then £(mn) = &.(m)-XI(n). 

(C.7) If and only if each row of A has the identical variance-covariance 
matrix £,(n) and the m rows are independent, then £(mn) = I(m)-Z,(n). 

Now suppose we make a transformation of the original matrix A, obtaining 
B(q X n) = C(q X m)A, and want to know the variances and covariances of 
the elements of B. Applying (C.4) to B gives b(qn) = [CA -x I(n)}h(n’) 
which, by a property of Kronecker products, can be written b = [C - x I(n)} 
-[A - x I(n)}h(n’), which is easily recognized asb = [C - x I(n)]a. Thus pre- 
multiplying the matrix A(m x n) by the matrix C(q x m) corresponds to 
premultiplying the vector a(mn) by the matrix [C - x I(n)]. Thus the variance- 
covariance matrix of B can be found as easily as that of a vector. 

(C.8) If Z(mn) is the variance-covariance matrix of the matrix A(m x n), 
then the variance-covariance matrix of C(qg x m)Ais[C - x I(n)]Z[C’ - x I(n)}. 

If the conditions of (C.6) are satisfied, the above matrix reduces to 
Cz.C’ -x I(n). If the conditions of (C.7) are satisfied, the same matrix be- 
comes CC’ -x £,. Comparing this latter form with (C.7) we obtain the fol- 
lowing conclusion: 





968 8. N. ROY AND WHITFIELD COBB 


(C.9) An orthonormal (or orthogonal) transformation applied to a matrix 
whose rows are uncorrelated yields a new matrix whose rows are uncorrelated, 
and if each row of the original matrix has a common variance-covariance matrix, 
it will be the variance-covariance matrix of every row of the new matrix. 


REFERENCES 
A more complete list of references may be found at the end of Part I. Only those items 
specifically referred to in Part II are listed here. 


[1] GNaANADESIKAN, R., ‘“‘Contributions to Multivariate Analysis Including Univariate and 
Multivariate Variance Components Analysis and Factor Analysis,’’ Institute 
of Statistics, University of North Carolina, Mimeo Series No. 158 (1956). 

[2] Heck, D. L., ‘Some Uses of the Distribution of the Largest Root in Multivariate Analy- 
sis,”’ Institute of Statistics, University of North Carolina, Mimeo Series No. 194 
(1958). 

{3} Pruuar, K. C. 8., “On the distribution of the largest or the smallest root of a matrix in 
multivariate analysis,’’ Biometrika, Vol. 43 (1956), pp. 122-127. 

[4] Prtuar, K. C. 8., Concise Tabies for Statisticians, The Statistical Center, University of 
the Philippines, Manila, 1957. 

[5] Roy, 8. N. anp GNANADESIKAN, R., ‘‘Some contributions to ANOVA in one or more di- 
mensions: II,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 318-340. 

[6] Roy, 8S. N., Some Aspects of Multivariate Analysis, John Wiley and Sons, New York, 
1957. 





AN ASSOCIATED POLYNOMIAL FOR LEAST 
SQUARES APPROXIMATIONS 


By RicHarp WARREN 


Missouri School of Mines and Metallurgy 


Summary. Tabular reconstruction of differences from a curtailed set of mo- 
ments is ordinarily impossible because of a gap on the integral side of the differ- 
ence table. This gap can be closed by substituting an associated polynomial for 
the one that is being operated upon. The associated polynomial has, at the end 
of a contracted range, terminal differences and moments individually proportional 
to corresponding differences and moments of the polynomial from which it was 
derived. The function is applied to the problem of matching moments to form 
least squares approximations. 


Introduction. The problem of least squares polynomial approximation to a 
set of n equally spaced observations reduces eventually to the problem of con- 
structing a polynomial to have a given set of numbers for its moments. A cur- 
rently favored procedure for this construction is to form a set of linear combina- 
tions of the moments which delivers the approximation arranged in terms of 
Chebychef’s orthogonal polynomials. A practical method for applying Cheby- 
chef's functions to the case of equidistant intervals was described by Charles 
Jordan in 1921 [1]. A more convenient variation was given by R. A. Fisher in 
1928 in the second edition of his textbook [2], and several variations were de- 
scribed by A. C. Aitken in 1932 [3]. Jordan published a revision of his own method 
in 1932 [4]. Fisher’s method is the one known to most statisticians; rules for ap- 
plying it are quoted in M. G. Kendall’s Advanced Theory of Statistics ((5), Vol. 
2, p. 164). 

All of the procedures that employ Chebychef’s functions have the same incon- 
venient feature; after obtaining the arrangement in orthogonal polynomials it is 
then necessary to perform another operation of equal magnitude to obtain a 
single numerical value or to tabulate the approximation over the given range. 
Another inconvenient detail is that the arrangement in orthogonal polynomials 
requires extra figures to be carried from the very beginning to absorb arithmetical 
error introduced by rounded quotients that occur early in the work. 

In the present paper we will describe a method for recovering differences from 
moments without the use of Chebychef’s functions. This is not a variation of pre- 
viously known methods, it depends on the use of a new type of function. In ap- 
plication the function makes it possible to obtain the rth degree least squares 
approximation to a set of numbers in a form immediately useful for the substitu- 
tion of numerical values or for tabulation. Incidentally, since the arithmetic does 


Received August 4, 1958; revised August 10, 1960. 
969 





970 RICHARD WARREN 


not require division operations until the last stage, arithmetical error is more 
easily controlled. 

The function that makes this computation possible is designed for use on the 
difference table. We can assume the reader is familiar with operations on the 
difference side of this device but the integral side is not as well known and a 
foreword on the more extended use of the difference table may be helpful. 


The Arrangement. The difference table is an instrument for computation. 
When applied to ordered sets of discrete numbers the use of the table corresponds 
to the use of a plotting linkage, or mechanical differentiator, or integraph in the 
study of continuous functions. When both the integral and the difference side 
are used together the device is capable of delivering, by simple arithmetic, 
results that would require complicated algebra. 

A rectangular difference table is an array of positions connected by a rule of 
cperation. The positions are arranged in horizontal lines and vertical columns 
and the rule of operation is that numbers are to be placed in the positions subject 
to the tabular relation 


a+be=c. * 


The relation holds uniformly throughout the table for any three positions in the 
configuration o 6 ; 


This array of interlocked positions is a computing device. To operate it 

1. Place on the table an initial setting, a pattern of starting numbers. 

2. Begin with any available position and write in it the number that satisfies 
the tabular relation. (In Table I, if any one of the numbers a, b, or c is missing, 
its value can be written in by inspection). 

3. Continue in this manner until a region of interest in the table is completely 
filled in. Completion will proceed along a more or less devious path determined 
by the pattern of starting numbers. 


TABLE I 
Au, 











LEAST SQUARES APPROXIMATIONS 971 


Not all patterns are admissible, since some would lead eventually to an in- 
consistency with the tabular relation, but the number and variety of admissible 
patterns is sufficient to make the device a useful and versatile instrument. It 
has been used for many years and in many different forms. The initial setting 
depends on the purpose to which the table is applied and the process of filling 
in the blank spaces in the neighborhood of a given pattern is called differencing, 
tabulating, summing, or computing moments, according to the region that is 
filled in. 

Like its graphical analogues the difference table delivers its results without 
requiring the use of algebra; its operation is mechanical. But to design an initial 
setting for the table, or to explain why the setting will deliver a certain result, it 
is necessary to use algebra. The indexing arrangement shown, the uniform tabular 
relation, and the off-set position of the sum adjacent to any column, automati- 
cally provide the necessary additive unit in the upper limit of a finite integral. 
This establishes in the algebra a convenient correspondence with the familiar 


notation used for definite integrals in the infinitesimal calculus. For a definite 
finite integral 


(1) Zu. = Sw — Su, 
where, by definition, 


b—1 
(2) Lette = a 


In the columns on the right side of the table the prefix S is not a mechanical 
operator, it is an identifying prefix in the compound symbol that represents a 
set, or any member of a set, which is a particular (finite) integral of the set of 
numbers in the column adjacent on the left. The particular integral represented 
by the symbol S‘u, is the set of numbers in the column with that heading. Any 
initial setting consistent with the tabular relation can be used to construct a 
particular integral and the symbol with prefix S is then used to indicate the 
result. It is necessary to distinguish between the operation and the result when 
one constructs a particular integral. 


The Operators A and 2. A correspondence between the members of two sets 
is observed by substituting one set for the other in the attention of the analyst. 
In the algebra that has been attached to the difference table the attention of the 
analyst is directed from the numbers in a given column to the corresponding 
numbers in another column of the same table by the substitution operators 4 
and 2. When prefixed to a symbol representing the set of numbers in a given 
column A directs the attention of the analyst to the adjacent column on the left, 
> directs the attention to the adjacent column on the right. These operators 
can also be applied to a literal expression conditioned to describe the numbers 
in a given set. The operators are distributive with respect to algebraic addition 
and they commute with constant multipliers other than zero. Repetitions of the 


operations are indicated by positive integer superscripts that obey the law of 
addition for exponents. 





972 RICHARD WARREN 


There is no symbolic operator that deletes numbers from the difference table. 
Application of the operator A to an expression u, , conditioned to describe a set 
of numbers on the table, will remove a constant term from the attention of the 
analyst in that expression but it does not remove it from the position it occupies 
on the table. Application of the operator = to an expression for the set Au, will 
restore the constant in the attention of the analyst, either (1) as an additive 
integration constant, or (2) as a term in a conditioning equation that accom- 
panies the algebraic description of the result of the substitution. 

The substitution operators A and 2, when applied to a specific table, commute 
with each other. But when the operators are applied to an abstract algebraic 
expression not conditioned to relate to a particular table the commutative 
property is lost. 

It is important to observe that the result of the operation 2 applied to a set 
u, is Su, , not Su, + C. The set represented by the symbol Su, contains at least 
one member which was originally a part of the initial setting and was selected 
without reference to the set u,. The addition of another integration constant 
would be incorrect. To identify the particular integral represented by the sym- 
bol it is necessary to look at the difference table the algebra is intended to de- 
scribe or to refer to a conditioning equation that identifies the set. The necessary 
integration constant must, of course, be written in a detailed algebraic descrip- 
tion of the set, but it does not appear in the compact symbol. 

Repeated substitutions of sets on the difference table may replace a set having 
prefix S with a set having prefix A, or a different substitution may have the 


reverse effect. The possible changes are exhibited well enough by the arrange- 
ment of the columns in the table, but in detail they are 


p<q p=4q p>q 


A’(A‘u,) = A’™*u, A’**u, Au, 
A’(S*u,) = S* *u, Us A” “u, 
>"(A‘u,) = A* "u, Uy S” *u, 
>"(S*u,) = S’**u, S?**y, S?**y. . 


Reduced Factorial Powers. Any ordered set of n numbers can be represented 
exactly by a conditioned polynomial of some degree less than n, and for some 
purposes the polynomial expression may be more convenient than the direct 
representation of the set on the difference table. A polynomial to be used as a 
description of a set of numbers on the difference table is arranged, for conven- 
ience, in factorial powers or reduced factorial powers. The nomenclature reduced 
is due to A. C. Aitken ([3], p 56). In the notation of Whittaker and Robinson 
({6], Ch. 1, See. 6; Ch. 3, Sec. 28) a descending factorial power is written with 
square brackets and an exponent: [z + a]” means the product 


(2 + a)(2+a—1)(e+a-—2)--- 





LEAST SQUARES APPROXIMATIONS 973 


to p factors, ending in (x + a — p + 1). A reduced factorial power is written 
as (x + a), meaning [x + a}’/p!. 

We assume the reader is acquainted with the effects of the operators 4 and = 
on the sets represented by polynomials arranged in terms of these functions. 


Some Properties of the Table. The difference table of a polynomial u, , tabu- 
lated for integer values of the independent variable z, is distinguished by a 
column of zeros for all of the differences of order r + 1, where r is the degree of 
the polynomial. In practical applications this column of zeros may be part of 
the starting pattern written explicitly in the table. Some of the useful properties 
of the difference table of a polynomial will be applied in the present work. We 
consider a completed table that contains related sets of differences, values, and 
particular iterated sums of a polynomial, tabulated for integer values of z, 
positive, negative, and zero. The lines of the table are indexed by the value of 
the independent variable and we refer to the numbers on a given line of the 
table as the numbers on line a, meaning the numbers on the line for which 
z = a. We select a certain column in the table and refer to the numbers in that 
column as the function of interest. The function of interest might be, at one 
time the set A’u, , at another time it might be the set u, itself, or at another 
time it might be the set S*u, , the particular iterated sum of that order which is 
exhibited on this table. The property is as follows: 


The numbers on any line a extending from the left, to and including the 
number in the column of interest, are the coefficients of the function of 


interest when it is arranged in reduced factorial powers (x — a),. The 
terms are in descending order from left to right, ending in a constant for 
the last term. 


Another property applied in the present work relates specifically to the in- 
tegral side of the table. This second property is not restricted to the table of a 
polynomial but applies to any ordered set of numbers. 


Given, as the initial setting on the table, a column of n numbers », , z = 0 
to n — 1 inclusive, and a line of zeros on line zero in the positions S‘v so 
that S‘vo = 0 for t = 1, 2, 3, --- as far as desired. When the right side of 
this table is filled in the sums on line n will be reduced factorial moments 
of the form 


(4) S‘v, = (-"E —n+t— 1), (t> 0). 


The first property is the basis of Newton’s formula for interpolation with for- 
ward differences ([6], Ch. 1, Sec. 8); the second is an example of the computation 
of moments by summation, an operation introduced by G. F. Hardy in the late 
1800’s in the work of graduating the British mortality tables ((7], Ch. 3, Sec. 9). 
Ordinarily these two properties are used separately; our present object is to 
establish a simple arithmetical link between them and compute differences from 
moments in the same way we compute moments from differences. 





974 RICHARD WARREN 


Matching Moments. In the normal equations which impose the least squares 
condition on an rth degree polynomial u, to approximate a set of n observations 
v, the numerical coefficients are moments, sums of products. The set of normal 
equations reduces to the statement that a set of r + 1 moments of the poly- 
nomial must equal the same set of moments of the observations. 

We have freedom of choice in selecting the type of moment to be matched. 
The only requirement is that the moment multipliers be a linearly independent 
set of polynomials derivable by a non-singular transformation from a set of 
polynomials of degrees 0 to r in zx. This set can be chosen to suit the convenience 
of the analyst. 

When r S n — 1 the normal equations have a unique solution and the different 
types of moments that can be matched all deliver the same approximating poly- 
nomial but, depending on the method of solution, the terms may be arranged in 
different ways. The usual choice for matching is a set of reduced fac- 
torial moments since these are so easily computed by iterated summation of the 
set of observations. 

In practice the degree r of the approximation is limited to values less than 
n — 1. This is a practical, rather than a mathematical, limitation. One can 
easily construct a least squares polynomial approximation for a degree r equal 
to or greater than n — 1 by direct differencing. The polynomial will coincide 
precisely with the set and the coefficients of the unnecessary terms of higher 
degree (e.g., the differences on line 0 of order greater than n — 1) may be chosen 
arbitrarily. A set of n numbers does not have more than n linearly independent 
power moments (and it may have less than n). Moments of order higher than 
n — 1 are linear combinations of the lower order moments and contain no new 
information. 

We will use a type of factorial moment which is zero for all orders greater than 
n — 1. (Equation 4). 


Rearranging the Solution. So far as algebra is concerned any computation of 
the first r + 1 moments of the set v, completes the construction of the approxi- 
mating polynomial u,. The numbers are taken as the first r + 1 moments of 
the polynomial. They are also the coefficients of the polynomial when it is 
arranged in terms of a complementary family of polynomials orthogonal to the 
moment multipliers in the given range. For the commonly used types of moments 
(those that are easiest to compute) the complementary polynomials are not 
well suited for the substitution of numerical values or for tabulation. For these 
purposes it is more convenient to have the polynomial srranged in reduced 
factorial powers and rearrangements of this kind are most easily performed by 
operating directly on the difference table. 

When n = r + 1 it is evident that the difference table of a polynomial given 
in terms of its moments can be reconstructed in reverse order, but when n is 
greater than r + 1 there is a gap on the integral side that cannot be filled in 
directly. This gap can be closed by substituting, for the polynomial u, which is 





LEAST SQUARES APPROXIMATIONS 975 


to be rearranged, an associated polynomial u, of the same degree and operating 
on the associated polynomial in the contracted range z = 0 toz = r + 1. In 
this range differences can be recovered from moments simply by reversing the 
summation process ordinarily used to compute the moments. 


The Participating Functions. The complete process which we are now in a 
position to describe deals with differences and particular finite integrals of three 
distinct functions: v, , u, and u. . 

The set of observations v, is a set of n discrete numbers, known only for z = 
0, 1, 2, --- (n — 1). Nothing is known or assumed about possible values of », 
at values of x other than these. The only part played by the set v, is to furnish 
the moments that determine the approximating polynomial. 

The approximation u, is a polynomial of degree r. This polynomial is simply 
an algebraic form in which numbers can be substituted. The form is pinned to 
the set v, at the n values z = 0, 1, 2, --- (n — 1) by the least squares condition 
but it has no assigned connection with possible values of v, at points other than 
these. It should be clear that extrapolation with a function chosen to suit arith- 
metical convenience has no mathematical justification. Although it is true that, 
when r S n — 1 the rth degree least squares polynomial approximation to a 
given set is unique, there is an unlimited number of different least squares 
approximations having the same number of parameters based on functions other 
than polynomials. 

We assume the approximating polynomial will be used to summarize experi- 
mental information, such as a frequency distribution. In practical applications 
the degree r will be some integer less than n — 1. If the analyst has observed 
that a set of differences A’v, is approximately constant and the variations are 
assumed to be due to chance the choice of a polynomial to describe the distribu- 
tion function is reasonable. The observed distribution is itself empirical and the 
use of an empirical formula to summarize the distribution has the advantage of 
conveying the useful information in a smaller number of terms. 

The particular integral S‘u, is a polynomial of degree r + t. In the present 
application it wil! be the particular finite integral of the polynomial u, charac- 
terized by the values S‘u) = 0 for t = 1, 2,3, --- as far as desired. 

Similarly, S‘u, will be the particular finite integral of u, characterized by the 
same set of initial values S'u = 0 for t = 1, 2, 3, --- as far as desired. 

Both polynomials u, and u, will be tabulated for integer values of z but the 
range of tabulation for u, is z = 0 to n. The range of tabulation for u,isz = 0 
tor+ 1. 

The associated polynomial u, is a polynomial of degree r derived from the 
polynomial u,. The terminal differences and the sums of u, on line r + 1 are 
individually proportional to those of u, on line n. After reconstructing the 
differences from the sums of the associated polynomial the proportionality 
factors can be removed, leaving the terminal differences of the desired poly- 
nomial. We construct the associated polynomial from a special set of particular 





976 RICHARD WARREN 


integrals of the original polynomial. This is the set obtained by choosing zero 
for the initial values of all of the sums on line zero. 

For a given polynomial u, the associated polynomial wu, is defined by the fol- 
lowing equation: 


(5) u, = A(z — r — 1) aoa Se). 

Then the proportionality factors are given by 
(6) Sus = (nm — t)o4nS'Un (¢ > 0) 
(7) Auras = (nm + t)n41) A's (¢2 0). 


Here S‘u, is the tth iterated sum of u, formed from the particular set of initial 
values S‘u = 0 fort = 1, 2, 3, --- as far as desired, and S‘u, is the tth iterated 
sum of u, formed from the particular set of initial values S‘u, = 0 for ¢ = 1, 2, 
3, ->+ as far as desired. The proportionality factors are reduced factorial powers 
of the descending kind and can be obtained from a table of binomial coefficients. 


Algebraic Demonstration. In numerical computation the construction of the 
associated polynomial consists merely in applying the proportionality factors to 
the set of moments. The only need for algebra is to identify the proportionality 
factors. To explain why the process delivers its intended result by such simple 
means we will use one of the difference calculus analogues of Leibniz’s theorem, 
but this algebra is not used in the performanee of the computation. 

Referring to equations (3) it will be observed that repeated application of the 
operator = to the defining equation for u, reduces the exponent of the operator 
A on the right, and that differencing increases it. We apply the operator ", 
or the operator A‘, to the defining equation (5) and expand the right side by 
using one of the difference calculus forms ([8], Ch. 2, Art. 10, Ex. 3) for the 
difference of a product 


(8) A°(eie) = & (P) A" hw vores 


This gives, for the particular sums of the associated polynomial, with initial 
values S‘uo = 0 for all t > 0, 


mt : 
(9) S‘u, = 2d (n — t)jA**4(z — r — 1) non 8 Mepnt-s - 
== 


In this expanded form all but one of the terms will contain the factor 
(z — r — 1) and will vanish at z = r + 1. The single non-vanishing term is the 
one in which that factor has been reduced to unity by repeated differencing; 
this is the term for which j = r + 1 — ¢. Writing out that term for z = r + 1, 


(10) S'urgs = (n — t)egr—tS'tin . 


Using the relation (a), = (a),», the proportionality factors for the sums can 
be written more conveniently as (n — t),»~(+1 - 


The proportionality factors for the differences of the associated polynomial 





LEAST SQUARES APPROXIMATIONS 977 


at z = r + 1 may be verified similarly. The factor for A°u, is obtained by operat- 
ing zero times on equation (5) and expanding the right side. For any integer 
t = 0 the expansion is 


n+t 
(11) A‘u, = 2 (n + t),A°*"“(z Pe 1) a4 dS” tepn4t—j . 
3 


Repeated differencing of the last factor oa the right reduces the order of the S 
to zero and for j 2 r + 1 the factor is a difference: A*“"*” of the object function 
u. At z = r + 1 the single non-vanishing term is the one for which j = r + 1 + ¢ 
and the result is 


(12) A‘urss = (0 + taopnA'tts . 


So the differences and sums of u’ on line r + 1 of its difference tabie are indi- 
vidually proportional to the corresponding differences and sums of u on line n 
of the difference table of u. It is only these terminal values that are proportional, 
the others are not. 

Integration constants are an important part of the process and the associated 
polynomial is designed for use only with the particular set S‘ue = 0, In practice 
the index ¢ will not be greater than r + 1. This is a practical rather than a mathe- 
matical restriction. The index ¢ can be greater than r + 1 but equations (6) and 
(7), though still true, become vacant identities. In equation (7) the index ¢ can 
be zero with the usual interpretation of A’u, as identical with u, . 

We avoid the use of negative exponents with A and 2 because, when these 
operators are expressed as matrices, they are singular and have no reciprocals. 


Numerical Demonstration. The application of the associated polynomial to 
least squares approximation can be more easily understood by following the 
steps in a numerical example. 


To construct a second degree least squares polynomial to represent the set of 
seven “observations” 


z 0 1 2 


Us | 6 8 14 


The process requires the following 5 steps: 

1. Compute the moments to be matched. 

2. Apply the proportionality factors to the moments. 

3. Reconstruct the terminal differences of the associated polynomial. 

4. Remove the proportionality factors from the differences. 

5. Tabulate the result. 

Three of the steps are tabular computations. In each of these the arithmetic 
consists in (1) writing down the pattern of starting numbers, (2) filling in the 
blank spaces with the numbers necessary to preserve the tabular relation. 


The example printed here is shown as it appears when completed. The starting 
numbers are shown in bold face. 





RICHARD WARREN 


STEP 1 
Computation of the Necessary Moments 
4 


| 
| 
| 





STEP 2 Multiplying by: 
Gives the sums: 
of the associated polynomial. 


STEP 3 
Reconstruction of the Differences from the Sums of the Associated Polynomial 
i ' 


| 


0 0 


1 | 0 
a 
r+1l=#3 | O 


STEP 4 Dividing by: 
Gives the differences: 
of the approximating polynomial. 


STEP 5 
Tabulation of the Values of the Approximating Polynomial 





LEAST SQUARES APPROXIMATIONS 


CHECK 
Apply Step 1 to the Result of Step 5 


The sum of the squared differences between the observations and their ap- 
proximations can be obtained by taking advantage of the least squares relation- 
ship. 

(13) Zo(ve — we)’ = Tov, — Ze us 


when u, is a least squares polynomial approximation to v, in the range of summa- 
tion. The absence of a cross product term is the advantage here. 

If one requires only an algebraic expression for the approximating polynomial 
Step 5 can be omitted. Step 4 delivers the coefficients of the polynomial in a 
form convenient for the substitution of any numerical value that may be of 
interest. In the example illustrated the polynomial approximation is delivered 
by Step 4 as 


(14) uz = 10(z — 7)2 + 66(z — 7) + 191. 


If Step 5 is completed the difference table presents the coefficients of the 
approximating polynomial in all of the different arrangements that are described 
graphically in a Fraser diagram. For example, taking the numbers on line 0 as 
the coefficients, another arrangement of the solution is 


(15) u, = 10(z), — 42 + 9. 


The arrangement of a polynomial in continuous powers z” is familiar through 
habit and custom but it has no mathematical pre-eminence. It is convenient 
for multiplication and division but for other operations there are more convenient 
arrangements. The arrangement in reduced continuous powers z, = 2’/p! has 
certain advantages in connection with differentiation and integration. An 
arrangement in reduced factorial powers is convenient for operations connected 
with the difference table and is equally convenient for the substitution of numeri- 
cal values. 

An arrangement in continuous powers is seldom necessary in the calculus of 
finite differences except in problems connected with interpolation or sub-tabula- 
tion. If it is needed the expansion of the factorial powers and the collection of 
terms is a minor arithmetical detail. In the illustration, 1rom the arrangement in 
equation (15), it can be performed by inspection. 


(16) uz, = 52° — Or + 9. 


In more complicated examples a collection in continuous powers would be 
performed by writing out the expansions of the (un-reduced ) factorial powers or, 
what amounts to the same thing, by using a table of Stirling’s numbers. 

It is interesting to observe in numerical examples that the degree r of the two 
polynomials u, and u, may be redundant (the coefficient of the highest power 
can be zero). Equations (6) and (7) will still be true and reconstruction of the 





980 RICHARD WARREN 


difference table will automatically deliver the zero coefficients along with the 
others. Of course both polynomials will be treated as being of the same degree 
and both will be found to have the same number of terms with zero coefficients. 

As an example for practice the reader may construct the fourth degree least 
squares approximation to the set of seven numbers used in the numerical demon- 
stration. 


Arithmetical Error. The numbers in the example illustrated were chosen to 
make the demonstration easy to follow; in practice the coefficients of the ap- 
proximating polynomial (the differences on line n) will have recurring decimal 
parts that must be rounded off. Rounding the coefficients introduces arith- 
metical error in the values of the polynomial computed from them aad for values 
of z far from n the error may be greater than the value of the polynomial itself. 
To absorb the arithmetical error one can carry extra figures in the results of the 
division operations that remove the proportionality factors and then discard 
the extra figures after tabulating the approximation. It can be shown by finite 
integration that the maximum possible error in the last decimal place of w , 
when computed from an rth degree polynomial arranged in terms of (z — n), 
with rounded coefficients, is +0.5(n + r),. To determine how many extra 
figures should be carried in the differences of u, , write out the numerical value 
of this maximum and count the effective number of figures in it to the left of the 
decimal point. For example when n = 12, r = 4, the error in the last place of 
any value in the range is not greater than +910. The effective number of figures 
would be counted as four to permit the usual rounding process when discarding 
the extra figures. 

The control of arithmetical error by extra figures is satisfactory enough but in 
the present computation there is another method available. We can postpone 
all approximate divisions to the very end of the work by tabulating a multiple 
ku, instead of u, itself. The multiplier k can be any common multiple of the 
proportionality factors to be removed from the differences. The least common 
‘multiple is convenient but any common multiple will do. 

To tabulate ku, multiply all the terminal differences A‘u,4,, including A’u,+:, 
by k before removing the proportionality factors. Then remove the proportional- 
ity factors, which of course will divide out exactly since k is a common multiple 
of them. Tabulate the result and finally, as a last step, divide the individual! 
values by k, carrying out the division in each case to as many decimal places as 
desired. All of the figures wiil be free of arithmetical error. This provides com- 
plete control over the arithmetical accuracy of the computation. 


REFERENCES 
{1] Cu. Jorpan, “Sur une série de polynomes dont chacque somme partielle représente la 
meilleure approximation d’un degré donné suivant la méthode des moindre 
carrés,”’ Proc. London Math. Soc., Vol. 20 (1921), pp. 298-325. 
[2] R. A. Fisner, Statistical Methods for Research Workers, 2nd Ed., Oliver and Boyd, 
London, 1928. 





LEAST SQUARES APPROXIMATIONS 981 


[3] A. C. Arrxen, ‘Graduation of data by the orthogonal polynomials of least squares,’’ 
Proc. Roy. Soc. Edin., Vol. 53 (1932), pp. 54-78. 

[4] Cu. Jonpan, ‘‘Approximation and graduation according to the principle of least squares 
by orthogonal polynomials,”’ Ann. Math. Stat., Vol. 3 (1932), pp. 257-357. 

[5] Maurice G. Kenpauyi, Advanced Theory of Statistics, 3rd Ed., C. Griffin, London, 1946. 

(6) E. T. WarrraKer anv G. Roprnson, The Calculus of Observations, 2nd Ed., Blackie and 
Son, London, 1926. 

(7] W. Pauin Exvperton, Frequency Curves and Correlation, 2nd Ed., C. & E. Layton, 
London, 1927. 

[8] Grorce Booue, The Calculus of Finite Differences, 3rd Ed., Stechert & Co., New York, 
1926. 





ASYMPTOTIC RATE OF DISCRIMINATION FOR MARKOV 
PROCESSES’ 


By Lampert H. Koopmans 
Sandia Corporation 


1, Summary. Simple hypotheses Hp and Hg , specifying two distinct positive 
transition densities p(z | y) and q(z | y) and initial densities p(x) and g(x) with 
respect to a finite Lebesgue-Stieltjes measure, are assumed for a discrete time 
parameter Markov process. Let R, be the likelihood ratio static based on the 
first n + 1 observations of the process, and consider the class of sequences of 
likelihood ratio tests T(a, a) = {[R, > n“a]:n = 0, 1, 2, ---} generated by 
letting a and a vary over the real numbers. Under certain regularity assumptions 
on K,(z, y) = p’ ‘(x | y)q‘(x | y) and the initial densities p and q, the subclass 
of consistent sequences is determined, and the limiting rates at which the error 
probabilities tend to zero for tests in this subclass are found. 

A definition of the best asymptotic rate for distinguishing between Hp and H 
is made for the class of consistent tests. This ‘asymptotic rate of discrimination” 
is evaluated and is shown to be attained by a certain subclass of these tests. 

Some applications and extensions of the theory to infinite Lebesgue-Stieltjes 
measures are given. 


2. Introduction. This paper is primarily a study of the asymptotic properties 
of the tail probabilities for the likelihood ratio statistic as applied to testing 
simple hypotheses for discrete time parameter Markov processes. Similar in- 
vestigations have been carried out by Cramér [2], Chernoff [1], and Thomasian 
[6] for sums of independent random variables and from more general points of 
view. In the special case that the Markov process reduces to a sequence of inde- 
pendent, identically distributed random variables, most of the results proved 
herein may be obtained from these papers. 

Let XX be the real line, ® the Borel sets, and u a Lebesgue-Stieltjes measure 
defined on (X, ®). Attention is restricted to the situation in which the distribu- 
tion of a Markov process MM = {X,:n = 0,1, 2, ---} is determined by a transition 
density p(x|y) (measurable in y for fixed z and a probability density with 
respect to uw in x for fixed y) and an initial density p(x) with respect to u. 

Let Hp and Hg be simple hypotheses for 9M specifying transition densities 
p(x\|y) and q(x | y) and initial densities p and g. The likelihood ratio statistic 
R,, based on the first n + 1 observations is then 


» ™ (Xo)q( Xi | Xo) --- q(Xn | Xn) 
(13) RiA%e. Zs, :-*, X.) @ eee 
seg 96 “p(Xo)p(Xs | Xo) --- p(Xn | Xn) 
Received September 28, 1959; revised June 4, 1960. 
1 The major portion of the work on this paper was carried out while the author was a 
graduate student in the Department of Statistics, University of California, Berkeley. The 


work was supported (in part) by funds provided under contract AF41(657)-29 with the 
USAF School of Aviation Medicine, Randolph Field, Texas. 


982 





DISCRIMINATION FOR MARKOV PROCESSES 983 


and a (nonrandomized ) likelihood ratio test is a set [R, > k,] for some k, e %, 
where the square bracket indicates the set in ®"* for which the inequality is 
satisfied. 

The error probabilities associated with [R, > k,] are P{[R, > k,j and 
QIR. Ss k,| = 1 — Q[R, > kj, where P and Q are the probability distributions 
induced on 90 under H>p and Hg. For simplicity it is assumed that k, is of the 
form n“a for real a and a. The results remain valid if k, is taken to be O(n“a). 
The tail probabilities to be studied are the error probabilities for the consistent 
members of the class of test sequences T'(a, a) = {[R, > n“a]:n = 0, 1, 2, --+} 
fora and ain &. 

As in [1], {2], and [6], the moment-generating function, in this case the moment- 
generating function M,(t) = E,[exp (R,t)], where Ep denotes expectation with 
respect to P, will play an important role in evaluating the limiting rates at which 
these tail probabilities tend to zero. The possibility of expressing M,(t) in terms 
of the nth iterate of a certain integral operator to be defined later motivates the 
use of operator theory in this investigation. For a complete treatment of the 
linear space theory pertinent to this study, the reader is referred to Zaanen [9]. 

The following assumptions apply throughout the paper: 

Al. Let K(x, y) = p’ ‘(x| y)q'(x| y). There exists a set A ¢ ®, such thai 


(ii). K.(z,y) > Oon O xX Aand K,(z,y) = 0 on (A X A)‘ almost surely with 
respect tow X pw(a.s.(u X w)) fort = Oand 1. 


Hereafter all functions and measures will be restricted to A, its products, and 
their restricted Borel sets. For integrals over the range A, no domain of integra- 
tion will be indicated. 

Clearly, Part (ii) could equivalently be stated for the range 0 < ¢ s 1. 

A2. There exists 5; > 0, such that the functions p'‘(x)q‘(x) are in £2. = Lo( A, u) 
for-jh Stsl+h. 

A3. There exists 6, > 0, such that 


¥ 2 i) Kealll 8 < 


fort = 0,1, where Kin(z,y) = {log [q(x | y)/p(z| y)])"Ke(z, y) and ||| ||| is the 
double norm defined for (possibly complex valued) w X mw measurable functions 


i 
Gite, w) by I) GIN) = {ff Cav) P daz) data) 


This assumption is a specialization of a condition imposed by Wolf in [8] for 
analyticity of operators in Banach space. 
Lemma 2.1. Assumption A3 implies the following conditions: 


(C1). | Kelll< © for OSt81. 


(C2). SA i KiallP < 2 for OStS1. 





984 LAMBERT H. KOOPMANS 


Proor. (C1) is an immediate consequence of (C2). (C2) follows from A3 by 
two applications of Hélder’s inequality: if f | f|'dy < © and f|\g|*dy < ~, 
where r' + 8 = 1, then f|fg|dy < [f | f \"dy)"" | g \"dy)"" with strict in- 
equality unless f = kg except on a y null set for some constant k. First apply the 
inequality to f(z, y) = [f.(z, y)p(z|y)P°", o(z, y) = U(x, vda(zl yl 
r=(1—t)",s=f",andy =u X uwheref,(z,y) = {log (q(x | y)/p(z| yi”. 
This results in the inequality ||| ks,» ||| < ||| Ko.» |{|'~* ||| Ki. |\I'. Next apply 
Hélder’s inequality to f(n) = {n!™ ||| Ko.» ||| 82}°', g(m) = {nt ||| Kis {Il 24', 
r = (1 —t)™, ands = ¢", and to y, the measure which assigns mass one to the 
integers 0, 1, 2, ---, and is zero otherwise. Then for 


O5tS SAU Kall s SAAMI Kon 8} {2 i Kea lll} 


\i-t 


5 {XA i xontiaeh {3 AU Kialll ae} < «, 


=0 


The following lemma, which is proved in Zaanen [9], pages 496-498, is a con- 
sequence of the above conditions. 

Lemma 2.2. For 0 s t S 1 the integral operator A , with kernel K (x, y) possesses 
a positive eigenvalue \(t) which is strictly larger in modulus than any other eigen- 
value of A,. Furthermore, every eigenfunction corresponding to d(t) is of the form 
¥v: = ko, for some constant k, where o, is an a.s. (u) positive function belonging to 
Le. 


3. The Class of Consistent Likelihood Ratio Tests. 

Lemma 3.1. There exist unique stationary densities ro and m, for MM under Hp 
and Hg, respectively, which are positive a.s. (u). 

Proor. By Lemma 2.2, there exist numbers A(t) > 0 and functions ¢, positive 
a.s. (u) such that A(t)d(z) = Ar @&(z) = JKi(z, y)oe(y)du(y). Taking £, 
(= L,(A, »)) norm on both sides of this expression, one obtains A(t) || ¢¢ |], = 
Jody) [Ki(z, y)du(x)}du(y) = || d ||, for ¢ = 0, 1; hence, A(0) = A(1) = 1. 

The proof is concluded by taking +; = ¢,/|| + ||1. 


Lemma 3.2. There exist numbers yp and yq independent of the initial densities 
p and q such that 


(i) lim R,/n = yp a.s. (P) 


non 


when H » is true and 


lim R,/n = ¥@ a.s. (Q) 
when H g is true, where P and Q are the probability distributions induced on 9. by 
the densities specified by Hp and Hg, 
(ii) yr SO S Ye with strict inequality unless p(x | y) = q(z\| y) a.s. (u X wu). 





DISCRIMINATION FOR MARKOV PROCESSES 985 


Proor. 9, with the distribution P, generated by K,(z, y) and *, , is a metri- 
cally transitive stationary stochastic process for t = 0, 1. If E, denotes expecta- 
tion with respect to P, , and B, is the integral operator with kernel 


| log [¢(z | y)/p(z| y)]| K.(z, y), 


then the inequality FE, | log [¢(X | Y)/p(X | Y)]| < @ fort = 0, 1 follows from 
the inequalities (1, Bar,) S || 1) || Bar. || (Schwarz inequality) and 


|| Bers || S || i] it Kea III. 


The last expression is finite by A3. The absence of a subscript on the norm 

symbol indicates £, norm. Under these conditions Birkhoff’s ergodic theorem 
applies to the function log {g(x | y)/p(x | y)] yielding the result 

‘ i< q( X, | Xi) q(x | Y) 

(; l | noe ~~ esa < “a Te 

31) Nitnawe 5 2 108 OK, | Xia p(X | ¥) 
when the distribution of M is P,,t = 0, 1. 

Now 


= E, log a.s. (P,) 


R, _ 1 m( Xo) 1 . q( X;, | Xi-1) 
n on log mi(Xo) on dy p(X, | Xx-1) 


and mw» and »; are a.s. positive; hence lim R,/n = E, log [g(X | Y)/p(X | Y)] 
a.s. (P,) when P, is the distribution of M, ¢ = 0, 1. If we let yp = 
Eo log (q(X | ¥)/p(X | ¥)] and ye = EF, log [q(X | Y)/p(X | Y)], Part (i) is 
proved for the case P = PoandQ = P,. 

To see that this result is valid for any pair of initial densities p and g for which 
log [¢/p] is finite a.s. (uw), let S, be the set in the range of the process for which 
(3.1) holds. Then Birkhoff’s ergodic theorem implies P,(S,) = 1 fort = 0, 1. 
Write 


PAS.) = [PAS,| Xo = 2) (x) du(z), 


where P,(S8,| Xo) is the conditional probability of S, relative to the sigma field 
of events generated by Xo. Since x, > 0 as. (un), Pi S,| Xe) = 1 as. (x). 
Thus, for arbitrary densities p and g, P(So) = [Po(So| Xo = z)p(x)du(x) = 1 
and Q(S,) = [P,(S,| Xo = z)q(x)du(z) = 1. The proof of Part (i) is com- 
pleted by noting that Condition A2 implies log [q/p] is finite a.s. (4). 

Because log z is concave, E(log X) s log E(X) and Part (ii) follows. 

To avoid the case yp = yq = 0, it is assumed that p(z | y) and ¢(z| y) differ 
on a set of positive u X » measure. 

THEoREM 1. Among the class of tests T(a, a), the consistent ones are: 


(i) T(0, «) for « > 1, 
(ii) T(a, 1) foryr < a < ¥¢ (and possibly a = yp), 
(iii) T(a, a) fora <land—-@& <a< @. 





986 LAMBERT H. KOOPMANS 


Proor. Part (ii) is an immediate consequence of Lemma 3.2. Write 
P{R, > n*a] = P[R,/n > n*“a] fora > 1. 


Then for a < 0 and « > 0, n* "a < yp — ¢ for n sufficiently large, which by 
Part (ii) implies lim, P[R, > n“a] 2 lim, P[R,/n > yr — «| = 1. Similarly, 
lim, Q[R, < na] = 1 for a > 0; hence, the only consistent test when a > 1 
is T(0, a). 

For a <1, write P[R, > n*a] = P[R,/n > a/n'*|. Let «€ > 0 satisfy 
vr + « < 0. Then for arbitrary a, a/n'"* > yp + « for sufficiently large n. Hence, 
lim, P[R, > n*a] S lim, P[R,/n > yp + «| = 0. A similar argument applies 
to lim, Q[R. < n“*a], which completes the proof. 


4. Properties of the Sequence of Moment Generating Functions. Let M,(t) 
be the moment generating function of R, under the hypothesis Hp . Then M(t) 
is the real restriction of the bilateral Laplace transform M,(z) =f". e"dF,(2), 
where F,(z) = P[R, S x]. Hence, as is well known from the theory of the 
bilateral Laplace transform, M,(z) is convergent and analytic in an infinite 
strip {z = t + is:a, <t < 8,} asis[M,(z)]"". Since M,(0) = M,(1) = l,a, $0 
and 8, 2 1. The following lemma strengthens this property. 

Lemma 4.1. Let 6 = min (6; , 52) where 5, and 5, are specified in Al and A2. 
Then there exists a constant M < « such that \[M,(z)]"" | < M for —8 S$ @(z) < 
1 + 6, where ®(z) denotes the real part of z. 

Proor. Let A, be the (complex-valued) integral operator with kernel 
K.(z, y) = p' “(x| y)q‘(z| y). It is easily verified that M,(z) = (1, A?k,) 
where k, = p' ‘gq’. Let A... be the integral operator with kernel K,,.(z, y) 
defined as in A3 with z replacing t. Then || A,,, || S ||| Ks. ||| = ||| Ke. ||| for 
t = ®(z) and, as a consequence of C3, 


Y Awa i|2—20["< © for 05 Km) SI 


and any z satisfying | z — 2 | S 4. This implies (see Wolf [8}) that the defini- 
tion of the operator A, may be extended to —5, S R(z) S 1 + & by the equation 


A,= YL Awn(2 — 2)" with || A,|| sl, = De Il Kew lll 8, 


where t = ®(z). But as was shown in Lemma 2.1, L, < Li ‘Li s L = max 
(Lo, L,); hence, for —é: S R(z) S 1 + &, || A, || is uniformly bounded by L. 
Now | M,(z) | = | (1, Ark.) | S || 1 || | ks |] | Ae ll"; hence, setting 


C = sup {|| k, ||:-8: S$ Q(z) $1+4), |[M.(z)]"" | = LI 1 || Cc)" 
for —5 S R(z) S 1 + 4. The proof is completed by letting 
M = L sup, (|| 1 || C)"”. 





DISCRIMINATION FOR MARKOV PROCESSES 987 


CoROLLARY. a, S —5 and, 8, = 1 + 8; hence (M,(z)]"" és analytic for —5 < 
R(z) <<1+6,n =1,2,---. 


Lemma 4.2. Let A.(O S t S 1) be the integral operator with kernel K,( 2, y) 
and let o( A,) denote the set of nonzero eigenvalues of A, . Then 
(i). there exist integral operators T, and B, with kernels of finite double norm 
such that 
A? = "(t)T. + BP 


forn = 1,2, -++ , where X(t), the largest eigenvalue of A, , is positive, 

(ii). Tf > O for every nonnegative function f & L, that is positive on a set of 
positive » measure, and 

(iii). o(B,) ={v: v X(t), v e€ o(A,)}. 


Proor. The index ¢ will be omitted in the proof of this and the next lemma. 

(i) The kernel K*(z, y) of the adjoint operator A* is K(y, x); hence, both 
K(x, y) and K*(z, y) satisfy the condition of Lemma 2.2. This implies that the 
largest eigenvalues of A and A* are positive and, as is well known, are identical. 
The eigenfunctions ¢ and ¢* corresponding to this eigenvalue for A and A* are 
positive and unique up to constant multiples. Since (¢, ¢*) > 0, @ and ¢* may 
be chosen so that (¢,¢*) = 1. Define the operator T by Tf = (f, o*)@ for f e L, 
and let B = A — XT. Then it is easily seen by induction on B® = (A — AT)" 
that A” = \"7' + B", provided 


(4.1) TA = AT = dT 


and 


(4.2) tT =T. 
To prove (4.2), let f e £,. Then 


T*f = (Tf, o*) = ((f, o*)d, o*)b = (F,6*)(6, 6") = TY. 
A similar computation proves (4.1). 

The operators T and B have kernels Kr(z, y) = $*(y)¢(z) and K,(2z,y) = 
K(z, y) — \K-(z, y). These kernels are of finite double norm since ||| Kr ||| = 
| @* || |] @ || and ||| Ke ||| S ||| K ||| + A]|| Keil. 

(ii) The fact that 7.f > 0 is clearly a consequence of the positiveness of 
@ and ¢*. 

(iii) It remains to be shown that o(B) = o(A) ~ {A} where ~ denotes set 
theoretic difference. Let » ¥ 0, v # \ satisfy Af = vf for some nonzero f ¢ £. 
Then »Tf = v(f, o*)@ = (Af, 6*)@ = (f, A*6*)@ = ATS which implies 
(vy — A)Tf = Oor Tf = 0, since» # \. Thus Bf = »f, from which it follows that 
o(A) C o(B) ~ {AQ}. 

If » + 0 satisfies Bf = vf fora nonzerof ¢ £, , then Af = ATf + vf and, apply- 
ing T to both sides, »Tf = ATf — ATf = 0. Thus Af = »f which implies ¢(B) C 
o(A). 





988 LAMBERT H. KOOPMANS 


But \ is not an eigenvalue of B since if it were, Bf = Af for some nonzero 
f ¢ &.. Since Bd = Ad — AT¢ = 0, f # ko for all k. Now Af = ATS + Af and, 
applying 7 to both sides, Tf = 27f and Tf = 0. Consequently Af = df, but 
this is impossible since f # k@. Thus d is not in ¢(B) and the lemma is proved. 


Lemma 4.3. lim [|| B? ||/(A"(t))] = 0 for OsSts 1. 


Proor. Since the eigenvalues of an integral operator with kernel of finite 
double norm are isolated in any region of the complex plane not containing the 
origin, rs = sup {| » |: v ¢€ o(B)} is attained for some v’ ¢ o(B). By Lemma 4.2, 
r,s = |v| < d. A well-known theorem in the theory of linear operators yields 
lim, || B” ||" = rp. Let ¢ > O satisfy rs < rp + ¢ < Xd. Then for n sufficiently 
large, || B” ||" < rs + ¢, which implies lim, || B” ||/A" = lim,[(rs + €)/d)" = 0. 

THEOREM 2. For every pair of initial densities p and q satisfying A2, 


(i) lim (M,(t)]"" = A(t) for Ot 51, 


(t) is conver and continuous and has a continuous first derivative for 


i) A 
ts 1, 


(i 
0s 


(iii) lim (d/dt)[M,(t)]"" = V(t) for OStS 1. 


Proor. Let f be a nonzero, nonnegative element of £, . Then 
(1, Arf) = (1, A(t) Tof + Brf) = A(t), Tif + Brf/a*(t)). 


Hence, by Lemmas 4.2 and 4.3, lim, (1, A?f)’” = A(t) independent of f. 

If f = ,, (1, ATf) = M,(t); hence lim, [M,(t)]"" = X(t) forO St S 1. 
As a consequence of this result, Lemma 4.1, and its corollary, Vitali’s theorem 
(see Titchmarsh [7], page 168) implies that [M,(z)]"" tends to a limit A(z) uni- 
formly in any region bounded by a contour interior to {z: —§ < ®(z) < 1 + 4§. 
But then A(z) is analytic in this region, which implies the continuity part of (ii). 

Each [M,(t)]"” is seen to be convex for 0 < ¢ < 1, since its second derivative 
is positive. Because convexity is preserved under passage to the limit, A(t) is 
also convex. 

Part (iii) is a consequence of the analyticity and uniformity of convergence 
of (M,(z)]"". 

COROLLARY. 


lim Ep(R,/n) = '(0) and lim EQ(R,/n) = (1), 


where Ep and Eg denote expectations with respect to the distributions defined on M 
by Hp and Hg. 

Proor. Since a bilateral Laplace transform may be differentiated under the 
integral sign in its region of convergence, the corollary is an immediate conse- 
quence of Part (iii) of Theorem 2. 





DISCRIMINATION FOR MARKOV PROCESSES 989 


5. Limiting Rate of Convergence for the Tail Probabilities of Consistent Test 
Sequences. 
Lemma 5.1. 


(i) lim sup P([R, > na]" < inf e A(t) 
non Ostsi 


n~o 


(ii) lim sup Q[R, < naj" s inf e“\(1 — 2) forallaeX. 
0sts1 


Proor. For an arbitrary random variable X with distribution P on (X, @), 
a well-known inequality (cf. Léeve [5], page 157) is P[X 2 0] s Ee“ fort 2 0. 

Set X = R, — na and let FE, denote expectation with respect to the distribu- 
tion defined by Hp. Then P[R, > na] S Ep exp[(R, — na)t] = eM, (t) 
for t > 0. Hence, by Theorem 2, lim sup, P[R, > na]* < e~ lim, [M,(t)]"" = 
e y(t) for 0 < t S 1 which implies Inequality (i). Part (ii) is proved in the 
same manner by setting X = —(R, — na) and noting that Eg exp (—R,t) = 
M,(1 —t?). 

The remainder of this section is devoted to showing that the limits lim, 
P{R, > na}"" and lim, Q[R, < naj" exist and are given by the expressions in 
Lemma 5.1. The method of proof will depend upon a modification of an inequality 
due to Thomasian [6]. 

To simplify the notation in the proofs of Lemmas 5.2 and 5.3, let 


Pn = p(%o)p(21| Zo) --* p(tn| Tn), 
and 


Qn = Q(20)q(21 | Zo) «~~ Q(2n | Zar). 


Integrals for which no measure is indicated are taken with respect to du"* 
where u”*’ is the n + 1 dimensional product of yu. 


Lemna 5.2. For everyaeX,t(0 <t <1), andn(n = 1,2, ---), 
(i) P{R, > na] = e"™" M,(t)Pinla < (R,/n) S b} 


for every b > a, 
and 


(ii) QIR, < na] = e™M,(1 — t)Pr-s0lb < (R,/n) S a] 


for every b < a, 
where 


P,,(A) = Ww [ Pn Qn for Ae a", 


Proor. To prove Part (i), let 


A, = |R, > naj) and B, = [a < (R,/n) gs DB). 





990 LAMBERT H. KOOPMANS 


Then A, > B, and 


PIR, > na} = i] Para = M,(t) (#2) dP tn 


—t 
=> M,(t) / (*) dP... = 6 "'M,(t) Pin( Ba). 
Ba n 


Part (ii) is obtained by altering this proof slightly. 
Lemma 5.3. For every a ¢€ ©, (0 < t < 1), 8(0 < 8s < min (t, 1 — t)), and 
n(n = 1, 2, ee), 


° —nbt ; _ ,ae M,{t ei 8) as M,(t + 8) 
(i) P{R, > na] 2 e”” M,(t) E 2! | 


for every b > a, and 


** nbt nbs M,(1 —f + 8) 
(ii) Q[R, S naj 2 e”'M,(1 — t) E e “was 
ps oe" M,(1 —t — | 
M,(i —t) 
for every b <a. 
Proor. Let C, = [R, S na] and D, = [R, > nb]. From Lemma 5.2 (i), 


(5.1) P[R, > naj = eM, (t)(1 — Pan(Ca) — Pin(Dz)). 
But 


/ 1—¢ ¢ l—i+s t~—s 
_ ,nas) nas Pn Qn nas Pn dn nas M,(t 7 8) 
PanlCa) = ye J a ae 5 "ya ** “EZ 


for 0 < s < min (t, 1 — ¢). Similarly, P,,.(D,) s &"™” [M,(t + s)/M,(t)]. 
The proof of Part (i) is completed by substituting these inequalities into Ex- 
pression (5.1). 

Again, the proof of Part (ii) follows from the proof of Part (i) with only minor 
changes. 

TuEorem 3. Let mp(a, a) = lim, P[R, > n“a]’" and 


ma(a,a) = lim, Q[R, S n“a)"”. 


Then 

(i) mp(a,1) = infosess C A(t) and mg(a, 1) = infogecs e“A(1 — 2) 
for (0) < a < A‘(1), 

(ii) mp(a, a) = me(a,a) = infoses: A(t) fora < landalla eX. 

Proor. To prove the first part of (i), it suffices to show that for every a and b 
for which \’(0) S a S b S X’(1), there exists t*(0 < t* < 1) such that for all 
sufficiently small s > 0, 

(5.2) lim e™([M,,(t* — s)/(M,(t*))] = lim e-"™"(M,,(t* + s)/(M,(t*))] = 0. 


no-no 





DISCRIMINATION FOR MARKOV PROCESSES 991 
For then by Lemma 5.3, lim inf, P[R, > naJ"" 2 €*"A(t*) 2 infosecs €'A(t) 
for every b > a, which implies lim inf, P[R, > na]"" 2 infogrgs € A(t). 
If c,(t) = (1, Afw,/A"(t)), then M,(t) = c,(t)dA"(t) and, by Lemmas 4.2 
and 4.3, lim, c,(t) > Ofor0 s t s 1. Thus, if we write 


e™"(M,.(t — 8)/(M,(t))] = len(t — 8)/(en(t))){[A(t — 8)/(A(8) )Je"}” 


and 

e "(M(t + 8)/(M,)t))] = fen(t + 8)/(en(t) [A(t + 8)/(A(t) Je" ", 
a sufficient condition for Equation (5.2) to hold is 

flog A(t*) — log A(t* — 8)|/s > a and flog A(t* + 8) — log A(t*)]/s < b 


for some t*(0 < ¢* < 1) and sufficiently small s > 0. But by Theorem 2(ii), 
d log \(t)/dt = \’(t)/X(2) is continuous for 0 s t S 1; hence, this condition can 
always be met for a and b in the specified range. The second part of (i) follows 
by a similar argument. 

From the inequality quoted in the proof of Lemma 5.1, P[R, > na] s 
exp (—n“at) M,(t) for t 2 0. Hence for a < 1, lim sup, P{R, > n“*a]"" s 
lim, exp(—n*~at) lim,[M,(t)]"" = A(t) for 0 S t < 1 and all a ¢ &. This 
implies lim sup, P[R, > n*a)"" S infoces: A(t). 

Now for every « > 0, n* a < « for sufficiently large n whatever be a. Hence 
lim inf, P{R, > n*a}" = lim, P{R, > ne)", and the first part of (ii) follows 
from (i). Again, a similar proof holds for the second part of (ii). 

It is not yet clear that the exponential rate of convergence embraces all con- 
sistent tests. It is easy to show that yp S X’(0) and A’(1) S ve, but the in- 


equalities may be strict. The following theorem resolves this point. 
THEOREM 4. vr = (0), and ye = X‘(1). 


Proor. The proof will be carried out only for the first equality since the proof 
of the second equality is quite similar. 

By means of the inequality P[X > 0] s Ee”, t 2 0, it is seen by the method 
of proof of Lemma 5.1 that for all k 2 0, P[R,/n > k] < e~™. The same inequal- 
ity yields P[R,/n < a] Ss e""M,(—t) for t 2 O where it will be as- 
sumed a < yp S 0. From the proof of Theorem 2(i), lim,[M,(—8/2)]"" = 
\( —6/2). Hence for « > 0 and n sufficiently large, 


P{R,/n < al s e™(A(—8/2) + ¢]”. 


Select a so small that e”{A( —8/2) + «] < 1 and let b be any positive integer. 
Define the sequence of bounded random variables Z, by Z, = R,/n for 
a <= R,/n s band Z, = 0 otherwise. Then, since lim, R,/n = yp a.s. (P) and 
a <~yp < 6, lim, Z, = yp a.s. (P). Hence by the bounded convergence theorem, 





992 LAMBERT H. KOOPMANS 


lim, EpZ, = yp. Now 


| Ep(R,/n) ee ve | s | Ep Z, —— ve | + | Ep (R,/n) — Ep Z, | 


| Ep Zn — ve | + 1 (ja|+k+1) 
kenO 


Pla-k-1s%<a-k 
n 


+> (k+ » P[ 2 > «| 
kamd n 


née " 
S| ErZ,—ve| +e’ [a(-§) +] 


_ni\-2 —nb =, 
{\ a\+ ( —e *) | + cea 
and this upper bound tends to zero with n. Hence lim, EpR,/n = yp as was to 
be shown. 


6. Definition and Evaluation of the Asymptotic Rate of Discrimination. A 
possible criterion for selecting a ‘‘best’’ test from the class of consistent tests is 
the minimax principle based on a luss function involving the asymptotic values 
mp(a, a) and mo(a, «).” Since the asymptotic behavior of the sequences T'(a, a) 
for a < 1 is equivalent to that of the sequence 7(0, 1), it suffices to restrict 
attention to T(a, 1) foryr < a < ye. The asymptotic rate of discrimination of 
the class of consistent likelihood ratio tests is defined to be the minimax rate 

p(P,Q) = inf max {mp(a, 1), me(a, 1)}. 


1 P<2<7Q 


THEOREM 5. e(P, Q) = inf X(t). 
Ostsi 


Proor. me(a,1) = inf e“A(1 — t) = e’ inf e A(t) = e’mp(a, 1). 
0stsi 0<ts1 


Thus mg(0, 1) = mp(0, 1) = info<:<; A(t). The theorem now follows from the 
fact that mo(a, 1) is nondecreasing and mp(a, 1) is nonincreasing in a. 

The asymptotic rate p(P, Q) is achieved by the sequences T(a, a) for a < 1 
and —«» <a< ,s80 the “best” test is actually an equivalence class of tests 
which are indistinguishable on the basis of the asymptotic rates at which their 
error probabilities tend to zero. 


7. Application of the Theory and Extensions to Infinite Measures. 
A. A class of processes for which the theory is immediately applicable is the 
one whose members possess densities bounded from above and away from zero 


2 The referee points out for those adverse to using the minimax principle that the general 
use of p may be justified by its relevance for any Bayes strategy corresponding to fixed 
nonzero a priori probabilities and fixed nonzero costs of error. 





DISCRIMINATION FOR MARKOV PROCESSES 993 


on a set of finite measure. More precisely, if there exists a constant C, > 1 such 
that for ¢ = 0,1,1/C, s K:(z,y) S Cron A x Aand K,(z,y) = Oon (A x A)‘ 
where u(A) < «, then for p and g satisfying 1/C, S pS C,and1/C, Sq Cs 
on A for constants C. > 1 and C,; > 1, Conditions Al, A2, A3 are satisfied. 
Condition A3 follows from the inequality 


ES Ill Kew ill S (a) Ci y eee 


Important members of this class are finite Markov chains for which H» and 
Hg specify transition matrices composed of positive elements. In this case, A 
consists of a finite set of numbers, and u is the measure assigning mass one to the 
elements of A and zero otherwise. 

B. Many processes for which 4(A) = * can be brought under the restrictions 
of Assumptions Al through A3 by truncation. If there exists a subset A’ of A 
and a constant C > 1 such that 1/C s K,(z, y) Ss Con A’ X A’ and 0 < 
u(d’) < o, then by defining K,(2, y) = Ki(z,y)/Ja K(x, y)du(z) on A’ X A’ 
and zero on (A’ xX A’)* for t = 0 and 1, the process restricted to A’ with transi- 
tion densities K,(x, y) and initial densities p’ and q’ constructed by truncating 
the original initial densities satisfies all three conditions. 

The truncation of a process is reasonable in many instances. For example, if a 
Markov process with densities defined on a set of infinite measure is used to 
approximate a real system for which observations outside a bounded interval 
are physically impossible, the restriction of the process to this interval by trunca- 
tion may yield a more appropriate model. 

C. Let u be Lebesgue measure, A any interval, and A’ a subinterval of A for 
which u( A’) < «. Let y be a one-to-one mapping of A onto A’ such that ¢ = y" 
has a derivative bounded away from zero on A a.s.(u) by some positive constant. 
Then if the Markov process for which hypotheses Hp and Hg specify K,(z, y) 
on A X A is replaced by a process for which new hypotheses A, and A, define 
Ku, v) = Ki(o(u), o(v))¢’(u) on A’ X A’ for t = 0, 1, and if Assumptions 
Al, A2, and A3 are satisfied by K,(u, v), P(u) = p(@(u))d’(u) and G(u) = 
q((u) )¢’(u), then all the results of Theorems 1 through 5 apply to the original 
process as well as its replacement. This follows from the fact that the prob- 
abilities P[R, > n°a] and Q[R, < n*a] generated under A> and f7, are equal, 
respectively, to P[R, > n*a] and Q[R, S n“a], defined in terms of the original 
process. 

D. Another way of handling the case 4(A) = © is to impose additional 
restrictions on the transition densities to compensate for the fact that L,(A, ») 
no longer properly contains L,(A, «) and that the function which is identically 
1 is no longer integrable. The details of the theory have been carried out by 
Koopmans [4] for equal initial densities using slightly different methods, and 
the results coincide with those for u( 4) < * except that it is not known whether 
the inequalities yp S (0) and X’/(1) S ve can always be strengthened to 
equalities as was done in Theorem 4. 





994 LAMBERT H. KOOPMANS 


8. Acknowledgments. The author is indebted to Professor David Blackwell 
for suggesting the problem and for his guidance during its solution and to Dr. 
Leo Breiman for many helpful suggestions and stimulating discussions. Thanks 
are also due to the referee for a number of clarifying remarks and for suggesting 
the simplified proof of Theorem 4. 


REFERENCES 


. Cuernorr, Herman, “A measure of asymptotie efficiency for tests of a hypothesis based 
on the sum of observations,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 493-507. 
. Cramé&r, H., ‘‘Sur un nouveau théortme-limite de la théorie des probabilités,’’ Actualités 
Scientifiqués et Industrielles, No. 736, Paris, 1938. 
3. Harpy, G. H., Lirrtewoop, J. E., anp Pérya, G., Inequalities, Cambridge University 
Press, 1934. 
. Koopmans, L. H., ‘“‘Asymptotic rate of discrimination for Markov processes,’’ unpub- 
lished doctoral dissertation, University of California, Berkeley, 1958. 
5. Lofve, Micue, Probability Theory, Van Nostrand, New York, 1955. 
. THomasiaN, ArAM Jonn, “‘On the magnitude of the sum of error probabilities,’’ unpub- 
lished doctoral dissertation, University of California, Berkeley, 1956. 
. Trtcumarsn, E. C., The Theory of Functions, 2nd ed., Oxford University Press, London, 
1959. 
. Wotr, Frantidex, “Analytic perturbation of operators in Banach spaces,’’ Mathematical 
Annalen, Vol. 124 (1952), pp. 317-333. 
. ZAANEN, ApRIAAN Corne is, Linear Analysis, North Holland Pub. Co., Amsterdam, 
1953. 





EQUALITIES FOR STATIONARY PROCESSES SIMILAR TO AN 
EQUALITY OF WALD 
By Sxu-Tex Cuen Moy 
Wayne State University’ 

I. Introduction. Let 2 be a non-empty set with elements w, F be a o-algebra of 
subsets of 2 and P be a probability measure on 5. Let T be a one to one map of 2 
onto 2 which, together with its inverse T~' are $-measurable and P measure 
preserving. For any random variable (real $-measurable function) X on Q, let 
TX be the function on Q defined by TX(w) = X(Tw) so that [TX ¢ B) = 
T '(X e B) for any Borel set B. Consider an $-measurable set E with P(E) > 0. 
For any we £ consider the images of w under iterates of T: Tw, T’w, ---, 
T"w, --- . If n, is the smallest positive integer for which T”"'w ¢ E we say that the 
first recurrence time », of Z is equal to n,. The Poincaré recurrence theorem 
({2], p. 10) asserts that », is well defined and finite almost everywhere on £. In 
fact the stronger version of the Poincaré recurrence theorem asserts that, for 
almost all w ¢ E, there are infinitely many positive integers n such that Tw ¢ EZ. 
Let us write down these integers according to their natural order, m ,m + m2, 
m + m2 + n;,--- . Then n, is defined to be the value of the kth recurrence time 
v, of w. Thus the successive recurrence times of FE: »,, », --- are well defined 


almost everywhere on £. If we introduce the conditional probability measure 
given E, P, , on § by 


(1) Px(A) = P(EN A)/P(E), 


then »,, », --- are well defined and finite valued with P, probability one on 
the whole space @. In [3] it was proved that {»} is a stationary sequence under 
P, measure. In this paper we shall introduce a P, measure preserving transforma- 
tion S which associates with {»,} in a very natural way. It is shown that S*"», = 
vy. ,k = 1,2,--- , 80 that the stationarity of { | is actually due to the P, measure 
preserving property of S. Let X, = T”X. It is then shown that sequences { X,,} 
and {X,,4...49,41 + °°* + X,,4...4,,} are stationary under P, measure. This 
leads to equalities (13) and (15), which resemble an equality of Wald for an 
independent sequence of random variables [1]. In fact, the proofs of (13) and 
(15) are also rather similar to the proof given in [1]. 


II. The Transformation S. Let £, E be subsets of Q defined by 


(2) e-en(Urz), 


neowl 


(3) g= en (Ure). 


nel 


Received March 15, 1960. 
1 Now at Syracuse University. 





996 SHU-TEH CHEN MOY 


E may be decomposed into disjoint, countably many pieces, D, , D,, --- , where 
(4) D, = ENT *E’N ---N TEN TE 


with E’ = 2 — E. Similarly, E may be decomposed into disjoint, countably 
many pieces, F, , F,, --- , where 


(5) F, = T*EN T*'E'N ---N TE’N E. 
We shall define a one to one map S of E onto E as follows: 
(6) Sw=T"» if weD,, n=1,2,3,---. 


In other words, S is identical to 7” on D,, . It is clear that D, is mapped onto 
F,, under S. E consists of all points of E for which there is a positive integer n 
such that 7"w ¢ E, therefore, has Pg measure one according to the Poincaré re- 
currence theorem. Applying the same theorem to T”' we conclude that E is also 
of P, measure one. Hence S and its integral (positive or negative) powers are 
well defined with P, probability one on Q. 

Lemma 1. If Ae Sand AC E then P(A) = Ps(SA). 

Proor. It is sufficient to prove that P(A) = P(SA). 


P(SA) = | U S(AN D.) | = > P(S(A ND,)) = D PIT"(A ND,)! 
1 


nom n=l nel 


= » P(A ND,) = P| U (AN D.) | = P(A). 
For any $-measurable function Z which is well defined up to a set of Pg measure 
0, SZ, SZ are defined by SZ(w) = Z(Sw), S'Z(w) = Z(S‘w). Again SZ, 
SZ are well defined up to sets of P, measure 0 and SS'Z = S"SZ = Z with 
P, measure one. The following theorem follows immediately from Lemma 1. 

Tueorem 1. Let Z be any random variable and Z, = S*Z,k = 1, 2,3, +--+. Then 
{Zx} is a stationary sequence under the conditional probability measure Ps . 

The natural connection between S and the successive recurrence times, 
v1, ¥2,°** , is revealed by the following theorem. 

Tueorem 2. S*'y, = » with P, measure one fork = 1, 2,3, --+ . For any post- 
tive integer k and any k positive integers n,, m2, --:,™m, Se = Tt" 7" on 
the set [v; = m1, v2 = Me, +++ me = Ml. 

Proor. For any k positive integers nm, , me, --- , Me, let 


Dang m = EN TCR N --- ATM ORN TMEN TOR 
nN .--/N T (mites) Pr T(t) En qr atest) py 
Rs re eee 





EQUALITIES FOR STATIONARY PROCESSES 


ns ta r~ iy em Eas 
ee PE Cc Da, ’ 
eee Cc Du, ’ 


aig 22th. Se RE 
Hence if w ¢ Da, ng.--+oms 
Sw = T"'we Da, ’ 


Sw = SSw = T™ Sw = T™ Ty ¢ D 


my) 


SS a, = T™S te = TMT HM, 


We observe that [», = n] = D, and [») = my, v2 = M2, -** , ve = Me] = Day ng seine: 
Therefore the second half of the theorem is proved. The first half of the theorem 
will be proved only for the case k = 2. The general case can be proved similarly. 


In the following, two sets are equal if they differ at most by a set of P, measure 
0. From the definition of S, 


(7) Sto=T"w if weF,, n = 1,2,3,--- 


Hence for any positive integer j, 


[Sy = j] = S"[y = 7] = U T“IFLN Dy 
kml 


UT “TENT E'N---NTENENTCEN---NTOPR'N TE) 
kel 


U [EN TEN ah te n T "RN TEN 7 *+) pe 
x NA THOR TOE 
= U [», =k, » = j] = [» = J). 
Hence Sy, = » with P, measure one. 


III. Two Equalities for a Stationary Sequence. Let X be any random variable 
and let 


(8) X, = TX, n=1,2,---. 


Then {X,} is a stationary sequence under the measure P. For any positive in- 
teger k define X,,,...4», by (9). 


(9) Bndonany = Kast+++40q om rut +my 


on the set [v; = m,--- , % = mm]. Then X,,,...,,, are well defined with P, meas- 





998 SHU-TEH CHEN MOY 


ure one. By Theorem 2, S*X = T"*"'*"*X on the set [», = m, --- , m% = ma). 
Hence 


(10) ae et Rett 


More generally, for any positive integer n, let f,(2 , --- , 2.) be a Borel measur- 
able function defined on the n-dimensional Euclidean space. Define 


r, a , , 

Zk = f,(X oa-bi+0had..g05 p Satederdngn48 9 — » Ming tong) 
as follows: 
(11) Zi 8 Joa Bang a--o4ng.g4t 9 20, 4+-400.048 9 Pr » Meat=-40q) 


on the set [», = m,, +--+ » = mj]. Then Z,, Z:, --+ are well defined with Pz 
measure one. We shall show that 


(12) Zixnr = SZ, k=1,2,--- 
with P, measure one. First, 
SZu(w) = Za(Ses) = fayysLXng4---40y41( Se), °° » Xagtorstangs (Se)] 
if Swe [vy = me, v2 = M3, °** , Me = Ney: Or, equivalently, if 
WE[ve = Ne, Vg = Mg, °°* Mear = Mess]. 
However, Sw = T"'w if w ¢ [», = m]. Hence, for 
we[v, = M1, ve = Me, °** , Meat = Megil, 
SZu(w) = fangs (Xngs---+mega( Tw), ++ > , Xngt---tmggs( Tw) ) 
= fangs (Xp tngt---+mpt(@), °° * » Kmytngt---+mgy3(@) )- 
= Zy4:(w). 


Hence (12) is proved and Z,, Z,,-+-- form a stationary sequence under P, 
measure. : 

For special cases of {Z,}, we have (a), (b), (c). , 

(a) Let f(a, ++, 2.) =n, then Z; = »,Z: = m,°-- 

(b) Let fa(ti, -+* , In) = 2, then Z, = X,, , Ze = Xr, 405,-.-. 

(c) Let fa(ti, «++, In) = % +--+ + 2, then Z, = Xi + --- + X,,, 
Z2 = Xai t °°* + Xan, *°*- : 

Tueorem 3. Let X be a P integrable random variable and let X, = T"X,n = 
1, 2,3, --- . Uf T ts ergodic then X, + --- + X,, is Pg integrable and 


/ ( X, tie ot mos’ dP, (/ X; ap) (/ Vi aPs) 
{1/P(B)] (/ X, ap) . 


Proor. It has been proved in [3] that, if T is ergodic, then » , »., --- are well 


(13) 





EQUALITIES FOR STATIONARY PROCESSES 999 


defined with P measure one, and limy.. (») + --* + »)k” = [1/P(E£)] with P 
measure one, and also f % dP, = [1/P(E£)|. Let be the set of all » for which 
we have simultaneously 


lim [r(o) +--+» + m%(w)\k* = [1/P(E)] 


Han (Xe) + +-> + Eelelie* oo [x aP. 


Then P(Q’) = 1. Hence the following equalities are true on 0’. 
lim (Xi + wees + X,,) + ee + § Arr + pin Sr + lawn 


= lim AE Kotte 
"err ee k 


- | / x, ar| (1/P(E)). 


if X is non-negative with P, measure one, then X,,4...4, 41 # °° + 
Xo, 4-4,% = 1,2, +++ , are non-negative with P, measure one. The conclusion 
of Theorem 3 follows easily from the fact that X, + --- + X,,, X41 + °°: + 
X,,4+,, °** , form a stationary sequence under P, measure and the following 
statement. If non-negative functions, g; , g: , --- form a stationary sequence and 
limy se (gi + --: + ge)k = g with probability one with g integrable, then g, 
is integrable and the integral of g, is equal to the integral of g. This statement 
can be easily proved by the ergodic theorem. If X is not non-negative apply 
Theorem 3 to | X |. Thus we have that | X,| + --- + | X,, | is Ps integrable 
and, therefore, X, + --- + X,, is also P, integrable. The ergodic theorem again 
implies (13). 

Tueorem 4. Let random variable X be P, integrable and let X, = T"X,n = 
1,2, --- . If T is ergodic then X,, is Py integrable and 


(15) [ x. dP, = [x dPy. 


Proor. For any subset A of Q, let 7, be the real valued function define on 2 by 
I4(@) 1 if weA 
0 otherwise. 


Then T"/, = I7-x,. Let X’ = XJ, and X), = T"X’, n = 1, 2, ---. Then 
fX'dP = fgXdP = P(E) §XdPx, so that X’ is P integrable. Applying 
Theorem 3 to the sequence Xi, X:--- , we have 


[ox +--+ X).) dP, = [1/P(E)] | [x ar | = [x dPy. 





1000 SHU-TEH CHEN MOY 


However 
Xiteee +X, = Mile + ++ + Xylene = X,,. 


Hence X,, is Ps integrable and f X,,dPs = {[ XdPx. 
Corouiary 1. If T is ergodic, so is S. 
Proor. Let X, X,, X‘, be as in Theorem 4. Applying (14) to {X4}, we have 


lim [(Xi + +++ + Xi.) Hees +X Hot FX RO 


kon 
> lf Xi ar | {1/P(E)] = [ xer, 


Bog 4-040y.048 Sa aaa + Tecenceaal = EK n4-+-4m = SX. 


with P, measure one. However, by (10), 


Hence 


(16) lim (SX + +++ + S*X)k" = [ Xa, 
k ow 


with P, measure one. Since (16) is true for any P, integrable random variable 
X, the conclusion that S is ergodic is thus proved. 


REFERENCES 
1} Davin Biackwe tu, ‘On an equation of Wald,’”’ Ann. Math. Stat., Vol. 17 (1946), pp. 
85-87. 
{2} P. R. Hatmos, Lectures on Ergodic Theory, Math. Soc. of Japan, Tokyo, 1956. 


[3] Suvu-Ten CuEen Moy, “Successive recurrence times in a stationary process,’’ Ann. Math. 
Stat., Vol. 30 (1959), pp. 1254-1257. 





MULTIVARIATE CHEBYSHEV INEQUALITIES’ 


By Atpert W. MarsHALt AND INGRAM OLKIN’ 
Stanford University; Michigan State University and Stanford University 


1. Summary and Introduction. If X is a random variable with EX* = o’, then 
by Chebyshev’s inequality, 


(1.1) PU|X\24s0/é. 


If in addition EX = 0, one obtains a corresponding one-sided inequality 
(1.2) PiX 24 s o'/(é + 0’) 


(see, e.g., [8] p. 198). In each case a distribution for X is known that results in 
equality, so that the bounds are sharp. By a change of variable we can take 
e=1. 

There are many possible multivariate extensions of (1.1) and (1.2). Those 
providing bounds for P{max;<;<.|X,;| 2 1} and P{max;<,;<, X; 2 1} have 
been investigated in [3, 5, 9] and [4], respectively. We consider here various 
inequalities involving (i) the minimum component or (ii) the product of the 
components of a random vector. Derivations and proofs of sharpness for these 
two classes of extensions show remarkable similarities. Some of each type occur 
as special cases of a general theorem in Section 3. 

Bounds are given under various assumptions concerning variances, covariances 
and independence. 

Notation. We denote the vector (1, --- , 1) by e and (0, --- , 0) by 0; the 
dimensionality will be clear from the context. If rz = (m, ---, 2) andy = 
(Yi,°** » Ye), We write xz = y(x > y) tomean z; 2 y;(z; > y;),j = 1,2, «++, k. 
If © = (o:;): k * k is a moment matrix, for convenience we write oj; = 0° 
j = 1,---, k. Unless otherwise stated, we assume that = is positive definite. 


2. On Proving Sharpness. Chebyshev inequalities are usually proved by 
defining a non-negative function f on R* (k-dimensional Euclidean space) such 
that f(z) = 1 for all z ¢ T C R*. Then if X is a k-dimensional random vector, 


(2.1) Ef(X) = [ sx ap + [ sare [ $20) aP = Pixe T}. 


{XerT} (xer) (XeT) 


Ordinarily, one states the inequality with some hypotheses 3 on the distribution 


of X (eg., EX’X = =) that permit an explicit determination of the bound 
Ef(X). 


Received September 1, 1959; revised July 12, 1960. 

! This work was sponsored in part by the Office of Ordnance Research at Michigan State 
University and the Office of Naval Research at Stanford University. 

* Now at the University of Minnesota. 


1001 





1002 ALBERT W. MARSHALL AND INGRAM OLKIN 


possible under 3, there exists a random vector Z satisfying 3%, with P{Z « T} = 
Ef(Z) — «. 

Except in Section 6, the sharpness of (2.1) will follow as a consequence of the 
stronger result that there exists a random vector (satisfying %) for which 
equality is attained. 

If one is to prove (2.1) sharp by exhibiting a distribution for X attaining 
equality, then that distribution must assign probability only to points ze T 
for which f(z) = 1 and to points z ¢ T for which f(x) = 0. Hence, to obtain a 
distribution for X achieving equality in (2.1), we begin by considering distribu- 


tions that assign probability only to the rows of a matrix o with 


(2.2) fie) =0, fiw”) =1, 


where c is the ith row of C: m x k and w” is the jth row of W: n xX k. Since 
f(z) = 1 for xe T, (2.2) implies c” zg T for all 7, but we still must specify that 


(2.3) w” eT, for all j. 


Conditions (2.2) and (2.3) may be sufficient to define both C and W (e.g., see 
[4, 5]). However, if f is a quadratic form that is not positive definite (p.d.) but 
only positive semi-definite (p.s.d.), then {z: f(z) = 0} is not finite and (2.2) 
will not define C. This means that when p.s.d. functions are used, there is no 
clear-cut way to find the spectrum of a distribution attaining equality. 

If Pic} = px, i = 1,2, ---, mand Pijw”} = g;,7 = 1, 2, ---, n then 
attainment of equality in (2.1) means that 


(2.4) ~a=9=B(X), YKp=a1-¢@ 


Most inequalities considered in this paper are stated with the hypotheses 
EX'X = %,i.e., 


(2.5) C’D,C + W’D.W = &, 
and sometimes also with EX = 0, namely, 
(2.6) eD,C + eDW = 0, 


where D, = diag (pi, --* , Pm) and D, = diag (qm, --* , Gn)- 

One can try to solve these equations subject to conditions (2.3) and (2.4) 
with the realization that (2.2) is then satisfied. Then by (2.1), such a solution 
satisfies c‘” ¢ T for all i. The above requirements may not be sufficient to define 
the various parameters, in which case the example attaining equality is not 
unique. 

If T is symmetric about the origin, then we lose no generality in assuming 
that the distribution attaining equality is symmetric about the origin, since 
(2.5) and the probability assigned to T are left unchanged if C, W, D, , and D, 





MULTIVARIATE CHEBYSHEV INEQUALITIES 


are replaced by 


( ) ( - Fg ad d 5(e" 5) 
-c}’ -w)/’ 3\o ob,)’ ™ 3\o D,/’ 


respectively, in which case (2.6) is automatically satisfied. 


3. Bounds Involving Convex Sets. If we wish the bound Ef(X) to be in terms 
of the first and second moments then f(z) must be quadratic, possibly with 
linear terms, i.e., f(z) = (2 — a) A(x — a)’. A bound is then obtained by 
minimizing Ef(X) subject to the conditions f(z) 2 0, f(z) 2 1 forze T. If 
the complement of 7 is bounded, then clearly these conditions are satisfied only 
for p.d. A. However, if T is either convex or the union of two convex sets, a 
minimizing A cannot be p.d. For if A is p.d., then by (2.2) C = 0:1 x k. Fur- 
thermore, {z: f(z) < 1} is strictly convex (an ellipsoid) so {f(z) = 1} N T has 
at most two points and W: 1 x k or 2 x k. However, a three point distribution 
is not in general sufficient to fulfill all the conditions EX’X = 2. 

The following theorem gives conditions when a minimizing A has rank 1, so 


that A = a’a, for some a: 1 X k, and the above procedure leads to sharp in- 
equalities. 


3.1. A General Theorem. 


Tueorem 3.1. Let X = (X,, ---, Xe) be a random vector with EX = 
0, EX’X = 3. Let T = T, Ula: —ze 7.4}, where T. C R* is a closed, convex 


set 


(i) If @ = {ae R*: (az’) = 1 for all x e T 4}, then 
(3.1) P{|X eT} s inf axa’, 
ara 


(3.2) P(X eT.) S inf (ada’)/(1 + ada’). 
aca 


(ii) Equality in (3.1) can be attained whenever the bound S 1; equality in (3.2) 
can always be attained. 
Remark. If 0¢7, then @ is non-empty, since 7' and {0} have a separating 


hyperplane. If 0 ¢ T, then the bound one is sharp for both T and T, , and we 
henceforth assume that 0 ¢ 7. 


Proor or (i). If ae @, then (3.1) and (3.2) follow from (2.1) with f(z) = 
(az’)* and f(x) = (ax’ + aa’)’/(1 + aa’)’, respectively. 

Note that the hypothesis 2X = 0 is not required for (3.1). 

To prove (ii) we need the following lemmas. We write 


qgq=q(a)=ara’, g=q*(a) = q/(1+4q), w = w(a) = at/q. 


Lemma 3.2. 2 — qu’w is p.s.d. and 2 — q*w'w is p.d. (Recall that = was as- 
sumed to be p.d.). 


Proor. From Cauchy’s inequality, for all z ¢ R’, 
zz’ = (zw')*/(wE'w’) = q(zw’)* 2 q*(zw')’. 





1004 ALBERT W. MARSHALL AND INGRAM OLKIN 


If x # 0, then strict inequality must hold in one of the two inequalities. || 

Lemma 3.3. There exists an aoe @ with inf ada’ = a’Sa,. For such an a, 
wy = w(ao) eT... 

Proor. Since we can obtain = J by a change of variables, it is clear that 
there exists (uniquely) an a ¢ @ with inf ada’ = aZay. 

To show that woe T,, assume the contrary so that there is a hyperplane 
separating w) and 7’, , i.e., there is a vector v and a number a with vw, < a, 
vt’ = a for allte T, . If we replace v by v + (1 — a)ao, we can assume a = 1. 
Since vw, < 1 implies aoLay > adv’, for sufficiently small e, 

e(vZv' — acZv’ + aoLao) < 2(aoLay — acy’). 
This is equivalent to uXu’ < aoZay , where u = w + (1 — ejay. But ue G@, 
which is a contradiction. Hence wo € T', : 

Remark. One can also obtain ae by computing 1/{infeer, t= "t’]; since for 
te T+ (apZay) (t2t’) = (at’)’ = 1, which implies that aolay = 1/(woE wo) = 
1/(t2't’) for all te Ty 

Proor oF (ii). For convenience the subscripts on go = (do), qo = q*(ao), 
and wy = w(do) will be omitted. 

We first prove that (3.1) is sharp. Choose r 2 k and let M: r x k be such 
that M’M = = — qw’w. Choose D = diag(p., ---, p-), such that p,; > 0, 
=pi = 1 — gq, and define C = D™“*M. Consider a random vector Z 
with P{Z = c} = PZ = —c”} = p,/2, (i =1,---,r), iZ = wt = P{Z 
= —w} = q/2, where c is the ith row of C. Then EZ = 0, EZ'Z = C’'DC + 
qu’w = &. 

By Lemma 3.3, we T so PiZ eT} 2 gq, but PiZ eT} s q by (3.1). Hence 
equality is attained whenever the random variable X has the same distribution 
as Z. 

We next prove that (3.2) is sharp. By Lemma 3.2, there exists a non-singular 
matrix M:k x k such that M’M = = — q*w’w. Choose an orthogonal matrix 
lr: k x k which rotates — q*wM™ to the positive orthant, i.e., —q*wM'T > 0. 
Define D = diag (p; , --+ , px) and C by eD' = [(p,)', --- , (pe)*] = —g*wM TD, 
C = D"“r’M. Consider a random vector Z with PiZ = ec} = p;, 

; ,k), PiZ = w} = q*, where c” is the ith row of C. Then 


EZ = eDC + wq* = (—q*wM'T) (I’M) + wq* = 0, 
EZ’Z = C'DC + q*w’w = M’'M + q*w’w = &. 
Let us verify that >> p; = 1 — g* = 1/(1 + q). Noting that w= "w’ = 1/q, 
DX pi = eDe’ = (q*)*w(E — q*w'w)'w’ = 1/(1 + q). 


By Lemma 3.3, w ¢ T, , so that P{Z e T,} = q*. Hence by (3.2), PiZeT,} = 
q*, and equality in (3.2) is attained whenever X has the same distribution as Z. 

Remark. Suppose 7', is not convex. The following example shows that (ii) 
need no longer be true even when @ is non-empty. 





MULTIVARIATE CHEBYSHEV INEQUALITIES 1005 


Let k = 2,7, = {z:2 = 0,23 + 2 2 I}, and let T = 7,U {z: —ze T,}. 
P{X ¢ T} < oj + o} follows from (2.1) with f(z) = zi + 2}. Now ax’ 2 lon 
T. if and only if a; = 1, a 2 1. But ada’ > oj + o} whenever oy» > 0 and 
a, = 1, a = l. 


3.2. Bounds Involving the Minimum Component. 


Tueorem 3.4. If X = (X,, --- , X,) is a random vector with EX = 0, and if 
EX'X = &, then 


(3.3) P{X eT} = P{min,; X; 2 1 or min, (—X;) 2 1} S min 1/(ez,"e’), 
(3.4) P{X ¢ T,}xe P{min; X; = 1} S min 1/(1 + e2;'e’), 


where the minimum is taken over all principal submatrices 2, of E such that eX," > 0. 

Equality can be attained in (3.3) whenever the bound S 1; equality can always be 
attained in (3.4). 

There always exist principal submatrices 2, of 2 such that eX;' > 0 (e.g., if 
z, is 1 X 1) so that (3.3) and (3.4) always provide a bound. 

Proor. The theorem follows from Theorem 3.1 if we show that the bound of 
(3.3) is the minimum of axa’ for ae @ = {a:a 2 0, ae’ 2 1}. 

Suppose the minimum occurs at an a whose non-vanishing co-ordinates are 
ad > 0; clearly de’ = 1. Since the minimum over d does not occur on the boundary 
(where some component is zero), the minimizing 4 must satisfy 22,4 + de’ = 0, 
obtained by differentiating 42,4’ + A(de’ — 1. Using de’ = 1, we obtain 


ad = eZ,’ /ez;'e' > 0. 


Inequalities (3.3) and (3.4) can also be obtained from (2.1) with f(z) = 
(eX,'#’)*/(eXie’)*, and f(z) = (eZ,'# + 1)?/(eZz'e’ + 1)’, respectively, 
where Ez’z = &, .| 

Remark. If 2, = (2,;), i,j = 1, 2, and 2, = Zy, then eZj'e’ = eXz'e’. Thus 
in order to find the bound of (3.3) or (3.4), one need not investigate all sub- 
matrices 2, of for which eZ; > 0. 

Some special cases of interest for which the bounds of Theorem 3.4 can be 
written more explicitly are given in the following examples. 

Exampte 1. If k = 2,01 S 03, 


if oi S ow, 
if oi > ow. 
2 


Exampte 2. If of = o, oy = op (i # j), then eX" > O and 
eX 'e’ = k/fo"(1 + (k — 1)p)). 

EXxamPLe 3. Let = = n(D, — p’p), where D, = diag(~i, --: , pe), p = 
(mn, -*', me), u= Dp<ilwv= 7 pi. It is easily verified that ex’ > 0 
and ee’ = fv + K/(1 — u)J/n. If X = (X,,---, Xi,n — DE X,) hasa 
multinomial distribution with parameters p,, --- , pe, 1 — u, the covariance 
matrix of X is singular, but the covariance matrix of (X,;, --- , X,) is =. 





1006 ALBERT W. MARSHALL AND INGRAM OLKIN 


Examp.e 4. A special form of Green’s matrix A = (a,;), a = aj; = uw,;(i Sj) 
is given by = = (o:;) with oi = 0, oy = & [[z.an,i < j, and is pd. if 
a; < 1 for all 7. In this case =~ has all elements zero except on the main, super- 
and sub-diagonals. It can be verified that eS’ > 0, and 


— l = L — a; rps l ) 
aden 4 (; = ted 2 (1 + as)(1 + axis) tt 


In the above examples where = has the form oR, one can replace = by DRD 
where D = diag(o;, 02, -** , ox). Then it may no longer be that e=* > 0, and 
examples can be obtained where various submatrices 2, lead to the best bound. 
However, if p = 0 in Example 2 or a; = O in Example 4, then = = 
diag(oi , --- , a4) so that eS’ > O and eDe’ = Zho;". 

Exampte 5. Let Yi, Ye, ---, Ye be uncorrelated random variables with 
EY; = 0, EY} = 1}, j = 1, 2, ---, k, and suppose that X, = )°i Y;,i = 
1, 2, --- , k, are partial sums. Then EX; = 0,7 = 1, 2, --- , k, and EX’X = 
> = (o,;) with o, = aii Ss j, where o, = > r; , 80 that oj Seis °°' Son. 
In this case, e;' > 0 only for Z,: 1 1, and min 1/(eZ;'e’) = o} . 

Proor. If U = (u,;) is upper triangular, u;; = 0,1 > j, uw; = ri, Sj, then 
= = UU' and eS“ = (r;’, 0, --- , 0). All principal submatrices of = are of the 
same form as 2, so that eZ;' > 0 only when &, is 1 x 1. || 

3.3. Bounds for products of random variables. 

TueoremM 3.5. If X = (X,, --- , Xy) is a random vector with EX = 0 and 
EX'X = &, then 


(3.5) P{X eT} = P{| [] X.| = land X > Oor X < 0} S min ako’, 


ac@ 


(3.6) P{X eT.) = P{| [] X,;| = 1,X > 0} < min (ada’)/(1 + aka’), 


aca 


where @ is given in Theorem 3.1. 
There is a unique solution a* of 


(3.7) ay = (axa’)a_,/k, 


with a* > 0 and Ila; = k*, where for any vector v,v_, = (vi, -**, v;). Further- 
more a* ¢ @ and min ata’ = a*<a™. 

Equality can be attained in (3.5) whenever the bound = 1; equality can always 
be attained in (3.6). 

Proor. Inequalities (3.5), (3.6), and the fact that equality can be attained 
all follow from Theorem 3.1. 

By the remark following the proof of Lemma 3.3, 


inf ada’ = a*Sa” = 1/(wd"'w’) = 1/(t2't’) 

@ 
for all te T,, where w = a*=/(a*a*’) ¢ T,. The minimum of t2"'t’, te T, 
occurs at ¢ = w, and w must be a boundary point of T, , so that [] w; = 1, 
w > 0. Minimization of (=~? via Lagrange’s multiplier yields 2w=™' = dw_,. 





MULTIVARIATE CHEBYSHEV INEQUALITIES 1007 


Post multiplication by w’ yields \ and the equation kw2™" = (w= ‘w’)w_,. 
Since a* = w='/(w='w’), we obtain ka* = w_, so the minimizing a* must be a 
Te of a4 = kw = kat/axa’. Furthermore, 1 = I]u: = 1/T] (ka? ) so 

a, = k”. 

To show uniqueness, suppose there is another solution u of (3.7) with 
IIu; = k*, u > 0; then udu’ = a*Za*’. Post multiplication by Z™u_,; in (3.7) 
yields u.,=~‘u_, = k/(udu’). Using the fact that the geometric mean is domi- 
nated by the arithmetic mean and then applying Cauchy’s inequality we obtain 


a;\'" aan a*za* | 
k=k]] (3) S usa” Ss ((u,t“u)(a*Za”))} = [e ; | sk 
U; uru 
Hence we have equality, so that u = a*.| 
Coro.uary 3.6. Equation (3.7) has one and only one solution in each orthant, 
subject to| [] a;| = ¢ > 0. 
Proor. One can replace the positive orthant in the arguments of Theorem 
3.5 by any other orthant. | 
We now consider two special cases for which the bounds can be given explicitly. 
Examp_e 1. If k = 2, then min ada’ = (0,02 + oy) /2. 
Examp.e 2. If the column sums of © are all equal, then e/k is a solution of 
(3.7) and min ata’ = (ee’)/k’, which is equal to o[1 + (k — 1)p|/k, when 
2 ' 


= = op. 


4. Some Related Bounds. 


Tueorem 4.1. If X is a random vector with EX'X = %, where min oj = oj 
then 


(4.1) P{min; | X;| 2 1) S$ oi, 


k k 
(4.2) Pi I] Xj\2us IT oi", 


k . 
(4.3) PL TTX; 2 s I] eo}. 
1 1 


Proor. Since {min | X,;{| 2 1} C {|X| 2 1j, (4.1) follows from (1.1). 

Successive application of (1.1) and Hélder’s inequality yields 

PET X; |" => s E| TTX," s (Tex, 
which is (4.2). The relation { Il aa Cw I] X; | 2 1} and (4.2) give 
(4.3). 

We now consider the question of sharpness. Suppose that oj = --- = 1 = o’, 
in which case all three inequalities can be proved by (2.1) with f(z) = >t x}/k. 
In order to satisfy (2.2) and (2.3), C: 1 x k must be the zero vector, and W: 
n X k must be a matrix with wy = +1. 

Matrices H = (hij): m K mwith hy; = +1 and HH’ = ml are called Hada- 
mard matrices. Various sufficient conditions for their existence can be found in 





1008 ALBERT W. MARSHALL AND INGRAM OLKIN 


[1, 6, 7]; e.g., they exist if m = 4(2" + 1) where x is an odd prime, ris a positive 

integer. A necessary condition for their existence is that m = 2 or m = 4¢ for 

some positive integer ¢, If H is a Hadamard matrix, so is HD where D = 

diag (+1, --- , +1). Hence we can assume that the first row of H is e. In this 

case all other rows w” have an equal number of positive and negative entries be- 
= 0. 

From (2.5) and the fact that C consists of the zero vector we know that 
attainment of equality depends on the solution of W’‘DW = 2. Our use below of 
Hadamard matrices for W stems from the fact that matrices = of a certain class 
are diagonalized by Hadamard matrices with first row e. 

Turorem 4.2. Let 04 = 0° S 1, 04; = o’p, (i ¥ j). (i) Equality can be attained 
in (4.1) and (4.2) if a Hadamard matriz of order k exists or p 2 0. (ii) Otherwise 
equality may not be attainable. 

Proor or (i). Any Hadamard matrix W (with first row e) of order k will 
diagonalize Z; i.e., 


(4.4) = = W'DW, 


where 


D, = diag (qa, roe » Qe) = diag {[1 + (k = 1)p}, (1 =) p), ee (1 a p)}o"/k. 
The characteristic roots of = are kg; > 0. 
Consider the random vector Z with 


P{Z =O =1-—0, PiZ = w} = PiZ = —w} = q,/2, 
(4 - 1, 2, esr ,k), 


where w™ is the ith row of W. Clearly >> q; = o’, EZ’Z = 2, and w® ¢ T when 
T = {min | z;| 2 1) or {| [] z;| = 1). 

If p = 0, let =* = o'[(1 — p)I + pe’e]: m X m where m = k is such that a 
Hadamard matrix of order m exists. 2* is p.d. and 


(4.5) 


o = P{ min |Z;| 2 1) < P{ min |Z;| 21) so’, 
lsignm lsiak 


o = PUL Z, 2 1} PUTT 251 2 ij, 


where the distribution of Z = (Z, , --- , Zm) is given asin (4.5). 

Proor or (ii). By {min | X;| 2 1} C {|]] X;| 2 Ui, it is sufficient to 
prove that (4.2) is not necessarily sharp. Since w,;; = +1, a random vector Z for 
which equality is attained when k = 3 must have a distribution of the form 
P{+ (1,1, 1)} = P{(1, 1,1) or (—1, —1, -1)} = ps, Pi (-1,1,1)} = me, 
P{+ (1, —1,1)} = ps, P{+ (1,1, —1,)} = ms, P{O} = 1 — o*. The conditions 
EZ'Z = = require p; = o°(1 + 3p)/4, p2 = ps = mw = o'(1 — p)/4. Lis pd. 
whenever —$ < p <.1 and no distribution attaining equality in (4.2) exists 
when —} < p < — }.| 

Tueorem 4.3. Let k > 2 and oj; = o° S 1, 01; = o’p, (i # j). Equality can be 





MULTIVARIATE CHEBYSHEV INEQUALITIES 1009 


attained in (4.3) whenever there exists a Hadamard matrix of order k; otherwise, 
equality may not be attainable. 

Remark. Since {X,X; 2 1} = {| X,:X2| 2 1, sign X, = sign X,}, an improve- 
ment of (4.3) fork = 2 and this 2 is given by (3.5) and Example 2 of Section 
3.3. 

Proor. A distribution attaining equality in (4.3) is given by (4.5). We need 
only show that w” ¢{]] 2; 2 1}. The first row w” of W is e, and all other rows 
of W must have an equal number of positive and negative entries. Because k is 
a multiple of 4, this means w” has an even number of negative entries. 

Equality cannot be attained in (4.3) if it cannot be attained in (4.2). || 


5. Bounds when only variances are known. 


TaeoreM 5.1. Jf X = (X,, --- , Xe) is a random vector with EX = 0 and 
EX} = oj, where oi S 05 ,j = 1,2, +--+, k, then 


(5.1) Pimin X¥; 21 or min(—X;) 21) sai, 
3 i 


(5.2) P{min X; 2 1) S oi/(1 + 01), 


(5.3) PUT] X;| 21 and X>0 or X <0} Ss [JI o3", 
(5.4) Pi J] X;|21 and X >O} s J] 07/0 + J] 3"), 
(5.5) Pimin | X;| 21) s oi, 


(5.6) Pi TT Xs) 2 s [1 o3", 
(5.7) P(T[T X; 21} s [I 07". 
Remark. The hypothesis EX = 0 is required only for (5.2) and (5.4). 
Proor. If T © T* C R* and P{X e T* < p, then trivially P{[X eT} < p. 
In this manner, inequalities (5.1) and (5.5) follow from (1.1); (5.2) follows from 
(1.2); and (5.3) follows from (4.2). Inequalities (5.4), (5.6) and (5.7) follow 
respectively from (5.3) and Theorem 3.5, (3.11), (4.2) and (4.3). || 
Taeorem 5.2. Equality can be attained in (5.1)-(5.6); equality can be attained 
in (5.7) if k > 1, whenever the bound = 1. 
Proor. Equality in each of (5.1)-(5.7) is achieved by one of the following 
distributions after a change of variable. 


(i) P\Y =e) = PiY = —d =0°/2, PIY =O} =1—<e', 
(ii) PiZ=ed=e7/(1 +0), P{Z = —o'e = 1/(1 +0’), 
(iii) P{U = w?} = o°/(2k — 2),j7 = 1,---,k, 


P{U = e& = o'(k — 2)/(2k — 2), PiU = 0} = 1-2’, 
where w” is the jth row of (2/ — e’e): k X k. 


Equality is achieved in (5.1) and (5.5) if X; = (¢;/0)Y;, in (5.2) if X; = 
(o;/0) Z;, where o = o. 





1010 ALBERT W. MARSHALL AND INGRAM OLKIN 


Define o* = []{o}*. Equality is achieved in (5.3), (5.6) and for k even in 
(5.7) if X; = (0;/0)Y;, in (54) if X; = (0;/0)Z;, in (5.7) for k odd if X; = 
(o;/)U;.|| 


6. Analogs of Kolmogorov’s Inequality. The following theorem restates some 
of the previously proven inequalities with the hypotheses strengthened so that 
they become, in a sense, analogs of Kolmogorov’s inequality. Of course, no 
added hypothesis can destroy the validity of an inequality, but it may destroy 
sharpness by permitting a better bound. For the following inequalities, we show 
that this is not the case. 

THeoreM 6.1. Jf Y;, --- , Yx are mutually independent random variables with 
E(Y;) = Oand E(Y}) = oj ,j = 1,--+,k, and X; = Di Y;, then 


(6.1) Pimin X;21 or min (—X;) 21} S$ oi, 


(6.2) P{min X; 2 1} S o3/(1 + 4), 


(6.3) P{min | X;| = 1) S oj. 


Proor. (6.1), (6.2), (6.3) follow from (3.5), (3.6), and (4.1), respectively. 
Direct proofs are immediate since {min X; 2 1 or min —X; 2 lj} € 
{min | X;| 2 1} C {| Xi | 2 1}, and {min X; 2 1} C {X; 2 1}. || 

We now show that the above inequalities are sharp. Inequalities (6.1) and 
(6.2) are the only inequalities that we prove sharp without showing that equality 
is attainable. (See Section 2 for a clarification of this distinction.) Indeed, we 
show that unless ¢3 = --- = of = 0, equality cannot be attained in (6.1) and 
(6.2) so that the probabilities of these inequalities are strictly less than the 
given bounds. 

THEOREM 6.2. Inequalities (6.1), (6.2) and (6.3) are sharp. Equality in (6.1) 
and (6.2) can be attained only if o. = --- = o, = 0. Equality in (6.3) can be at- 
tained whenever the bound s 1. 

PROoF. 

Case of (6.1). Let 0 < 6 < land Z = (Z,, --- , Z,) be a random vector with 
mutually independent components such that 


P{Z, —l} = 03/2, P{Z, =O} =1-—<o}, 
P{Z; a ;/8} = P\Z; — —o,;/3} we &, P\Z; = 0} =1- é, 


j=2,--: 
Then 


j 3 
P{min 3 2. 2 1 or min (- 2) 2 | 
j 1 j 1 } 
- 


> P{Z,=1 orZ, = —1)[]F = oi(1 — &)*", 


2 





MULTIVARIATE CHEBYSHEV INEQUALITIES 1011 


which approaches oj as 6 — 0. Since EZ; = 0 and EZ} = oj}, j = 1, ---,k, 
(6.1) is sharp. 
To attain equality in (6.1), i.e., in 


(6.4) P{minX;21 or min(—X,;) 21) SPi|X|2N sai, 
3 3 


it must be that equality is attained in the right hand inequality. By (2.2), this 
means that if the random vector Z attains equality, P{|Z, = 1 or —1 or 0} = 1, 
and since EZ, = 0, P{Z,; = 1) = P{Z, = —1} = of, P{Z, = 0} = 1 — oh. 
Suppose that 7 is the smallest index (i > 1) for which oj > 0. Then P{Z,; = 0} = 
1,j = 2,---,4 — 1, but Z; must assume some value v ¥ 0 with positive proba- 
bility. Ifv > 0 and if Z, = —1,Z; = v, then (Z,,--- , Z,) e T because Z, 2 1 
and — (Z, + » Naas + Z;) 2 1. But P{Z, = —1,2Z; = 0} = P{Z, = —1} P(Z; = v} 
> 0. This means equality is not attained in the left hand inequality of (6.4). A 
similar argument holds when v < 0. 

Case of (6.2). Let Z be a random vector with mutually independent com- 
ponents such that P{Z, = 1} = o3/(1 + oi), P{Z, = —oi} = 1/(1 + oi), 
P{Z; = 8 = 05/(8 + 03), PiZ; = —o5/8} = &/(& + oj). Then if 5 > 0, 


2 k 2 
aoa ae atte Tk epee 
P{ min (Zs + +ayzihe # + 03’ 


which approaches [¢}/(1 + o})] as 6 — 0, so that (6.2) is sharp. 

The argument that equality cannot be attained in (6.2) is essentially the same 
as for (6.1). In this case a random vector Z attaining equality requires P{Z, = 1} 
= oi/(1 + oi) and P{Z,; = —oi} = 1/(1 + 7). 

Case of (6.3). Let Z;,7 = 1, --- , k be mutually independent random vari- 
ables such that P{Z; = +277} = 05/2", P{Z; = 0} = 1 — 03/2", (j = 1, ++, 
k), then EZ; = 0, EZ} = oj for all j, and P{min,;| Z, + --- + Z;|2 1} = 
P{|Z,| 2 1) = of. Hence, equality is attained in (6.3) whenever Y has the 
same distribution as Z. 

Since {min; X; 2 1 or minj(—X,) 2 1} © {min;| X;| 2 1), sharpness of 
(6.1) implies sharpness of (6.3), but does not imply that equality can be attained 
in (6.3) as we have just proved. 

The inequalities of Theorem 6.1 can be obtained by specializing previous 
results to the case that Y, , --- , Y,; are uncorrelated, then adding the hypotheses 
that Y,, --- , VY are independent. The same procedure when applied to Lal’s 
inequality [2] yields a bound for P{| Y;| 2 1 or | ¥; + ¥2| 2 1} with ¥Y;, ¥; 
independent; in curious contrast to our case, it is not sharp since Kolmogorov’s 
inequality provides a better bound. 


7. Some Extensions. There are a number of methods by which the results 
can be extended. We mention only a few and give some partial results. 
7.1. Extensions to Stochastic Processes. If {X,,t ¢ T} is a real stochastic process 





1012 ALBERT W. MARSHALL AND INGRAM OLKIN 


with EX, = 0 for all t ¢ T, then 
(7.1) Plinf | X,| 2 1} s inf EX? = p, 
teT 


teT 


(7.2) Plinf X,21 or sup X,< —lj Sp. 


teT teT 


(7.3) Piinf X, 2 1) S p/(1 + p) 


whenever the probabilities are defined. The proof is analogous to that of Theorem 
6.1. 

In view of Theorem 3.4, we can hope sometimes to improve (7.2) and (7.3) 
if the covariance function of the process is known. General results are not easily 
obtained. We content ourselves with a single example for which (7.2) and (7.3) 
can be improved and concentrate on showing that no improvement is possible 
if the process is a martingale. 

Tueorem 7.1. Jf T is not finite and {X,,t eT} is a process with EX, = 0 and 
EXi = o°, EX,X, = o'p, (8 # t), forall s,te T, (where O < p < 1), then 


(7.4) Plinf X, 21 or sup X, S —1} S o’p, 
teT teT 


(7.5) Piinf X, = 1) S o°p/(1 + o’p) 
teT 
whenever the probabilities are defined. 
Proor. This follows from Example 2 of Section 3.1. || 
Turorem 7.2. If {X,,t¢ T = [0, r]} is a martingale with EX, = 0 and EX? = 
o*(t), then 


(7.6) Piinf X,21 or supX,S -lU Ss (0), 


teT teT 


(7.7) Plinf X, 2 1) S 0°(0)/{1 + o°(0)) 


whenever the probabilities are defined. 

Equality is attainable in both of these inequalities if o°(-) is right continuous. 

Remark. Inequalities (7.6) and (7.7) remain true if we replace the condition 
that the process is a martingale by the condition that the process has covariance 
EX,X, = o'(s), s S t (of course, o’(-) must be non-decreasing). This is the 
case, e.g., if the process has orthogonal increments, and with this replacement 
the theorem would generalize Example 5 of Section 3.2. We have not chosen to 
weaken the conditions of the theorem because to do so would weaken the result 
that equality is attainable. 

Proor. (7.6) and (7.7) are immediate consequences of (7.2) and (7.3). We 
now indicate how a martingale {Z, , ¢ ¢ [0, r]} can be defined attaining equality 
in (7.6). Let @ be a random time with its distribution to be chosen later. Let the 
sample functions Z, be zero for t < 6, and fort 2 @ # 0, let Z; = a or —a witn 
equal probability. If @ = 0, let Z, be constant at 1 or —1 with equal probability. 





MULTIVARIATE CHEBYSHEV INEQUALITIES 1013 


Obviously Z, is a martingale and EZ, = 0. EZ = Pj@= 0} +a Pi0< es 4 
and this is o’(t) if @ has the distribution P{@ = 0} = o°(0), Pi@ s § = 
[o°(t) — o°(0)}/a’ + o°(0). Choosing a so that P{@ < r} = 1 gives 
al [2 - o°(0) 
1 — o(0) 
which is less than one. Hence 


P{ inf Z,21 or sup Z, Ss —1) = P{@ = 0} = o'(0), 
te[0,r) te[0,r) 


so that equality is attained in (7.6). 

To attain equality in (7.7), again let @ be random in [0, r] and let Z, = 1 if 
6 = 0. 1f 6 # 0, let Z, = —o'(0) fort S @ and fort > 6, let Z, be —o'(0) + 8 
or —o'(0) — 8 with equal probability. If 8 = {{1 + o7(0)][o*(r) — o°(0)}}' and 
6 has distribution 


o*(0) 
1 + o*(0) 


2 2 
2 -<® a (9) bs 7) 


Pia =0) =o (0)/f1+0(0)], Pilest} = 


it is easily verified that the process Z, has the desired properties. || 

7.2. Extensions Through Transformations. Let X be a random variable with 
EX = 0, EX’ = o’. By means of the linear transformation y = nz + u, 9 > 0, 
one can obtain Chebyshev’s inequality in its usual generality from (1.1) with 
e= 1, 

Multivariate Chebyshev-type inequalities with hypotheses concerning means 
and covariances can be extended similarly by linear transformations, and, in 
fact, the possibilities are much greater than in the univariate case. 

Let X be a random vector with EX = 0, EX’X = &, and suppose that one 
has the inequality 


(7.8) P(X eT} Ss p(z). 


If H is a non-singular matrix, then using the transformation y = zH™' + yu one 
obtains 


(7.9) PLY eS} s p(H'Tl4), 


whenever Y isa random vector with EY = yw, EY’Y = Il, andS = {y: (y —4)H 
eT}. 

Clearly, (7.9) is sharp whenever (7.8) is sharp. 

Non-linear transformations may also be useful, e.g., with Y; = X},j7 = 1, 
--+ , k, the results of Section 5 yield corresponding results for positive random 
variables in terms of their expectations. 

7.3. Bounds for Subsets. It is immediate that if T, C T; and (i) P(T;) S p, 


then (ii) P(T:) S p. Obviously, if (ii) is sharp, then (i) is sharp, and it is per- 





1014 ALBERT W. MARSHALL AND INGRAM OLKIN 


haps surprising that in many cases (ii) is sharp. Examples of this are (4.1), (5.1), 
and (5.2). 

As a further application, let us consider inequality (3.4), but suppose that 
some entries of the covariance matrix = are unknown. Then we can consider 
subvectors (X;,, --- , Xi,) of (Xi, «++ , Xx) for which the corresponding co- 
variance matrix is known and’ apply (3.4), together with 


P{min X; = 1} S P{min X;, 2 lj. 
lsiakt lsign 


Whether this procedure (which can also be applied to (3.3)) leads to sharp 
inequalities is not known, but in Section 5 we proved sharpness when only the 
diagonal elements of 2 are known. This procedure can be used whenever at least 
one diagonal element of = is known. 


8. Acknowledgment. The authors are indeed grateful to H. Chernoff, 5. 
Karlin, and John Pratt for suggesting proofs of Lemma 3.3, and for many 
valuable comments and discussions. 

REFERENCES 

{i] R. C. Bose anp 8S. 8S. SurtkHanpe, “‘A note on a result in the theory of code construc- 
tion,’’ Information and Control, Vol. 2 (1959), pp. 183-194. 

{2] D. N. Lat, ‘“‘A note on a form of Tchebycheff’s inequality for two or more variables,”’ 
Sankhyd, Vol 15 (1955), pp. 317-320. 

{3] Atpert W. Marsuwa.t, ‘On the growth of stochastic processes,’ Technical Report No. 
32, Department of Math., Univ. of Washington, 1958, pp. 1-77. 

[4] ALpert W. MarsHacy anp INGRAM OLKIN, “‘A one-sided inequality of the Chebyshev 
type,’’ Ann. Math. Stat., Vol. 31 (1960), pp. 488-491. 

(5) INGRAM OLKIN AND Joun W. Pratt, “A multivariate Tchebycheff inequality,’’ Ann. 
Math. Stat. Vol. 29 (1958), pp. 226-234. 

[6] R. E. A. C. Pavey, “On orthogonal matrices,’’ J. Math. and Phys., Vol. 12 (1933), pp. 
311-320. 

(7] R. L. Puackettr anv J. P. Burman, ‘‘The design of optimum multifactorial experi- 
ments,’’ Biometrika, Vol. 33 (1946), pp. 305-325. 

{8] J. V. Uspensxy, Introduction to Mathematical Probability, McGraw-Hill, New York, 
1937. 

9] P. Wurrrte, “A multivariate generalization of Tchebichev’s inequality,’’ Quart. J. 
Math., Oxford 2nd Ser., Vol. 9 (1958), pp. 232-240. 





ON DEVIATIONS OF THE SAMPLE MEAN 


By R. R. Bawapur anp R. Ranga Rao 
Indian Statistical Institute, Calcutta 


1. Introduction. Let X; , X2, --- be a sequence of independent and identically 
distributed random variables. Let a be a constant, — «= < a < , and for each 
n= 1,2,--- let 


x oud Be 
(1) p, = p(t 4% 2 a). 
It is assumed throughout the paper that the distribution of X, and the given 
constant a satisfy the conditions stated in the following paragraph. These condi- 
tions imply that p, > 0 for each n, and that p, —~ 0 as n — «. The object of 
the paper is to obtain an estimate of p, , say g, , which is precise in the sense that 


(2) qx/Pn = 1 + o(1) as nt-> @, 


Let ¢ be a real variable, and let g(t) denote the moment generating function 
(m.g.f.) of X;, i.e., g(t) = E(e™'),0 <¢ & @. Define 


(3) ¥(t) = &““g(t). 


Let 7 denote the set of all values ¢ for which g(t) < «©. We suppose 
that P(X, = a) # 1, that T is a non-degenerate interval, and that there exists a 
positive r in the interior of T such that ¥(r) = inf, {p(t)} = p (say). These condi- 
tions are satisfied if, for example, ¢(t) < @ for all t, E(X,) = 0, a> 0, and 
P(X, > a) > 0. In any case, r and p are uniquely determined by 
(4) Cue and p=y(r), 

¢(r) 
where ¢’ = dg/dt, and we have 0 < p < 1. 

There are three separate cases to be considered. 

Case 1: The distribution function (d.f.) of X, is absolutely continuous, or, 
more generally, this d.f. satisfies Cramér’s condition (C) [1, p. 81). 

Case 2: X, is a lattice variable, i.e., there exist constants z) and d > 0 such 
that X, is confined to the set {zo + rd:r = 0, + 1, + 2, --- | with probability 
one. 

Case 3: Neither Case 1 nor Case 2 obtains. 

We can now state 


Tueorem 1. There exists a sequence b, , bz , «++ of positive numbers b, such that 


o iP eS 
(5) Pe = oy b, [1 +0(1)], logdb, = O(1) 


Received September 29, 1959; revised July 13, 1960. 
1015 





1016 R. R. BAHADUR AND R. RANGA RAO 


asn— ~. In Cases | and 3, b, is independent of n. This last also holds in Case 2 
if P(X, = a) > 0. 

The proof of Theorem 1, and of Theorem 2 below, is given in Sections 2-5. 
The present determination of b, is given by (4), (9) and (33) in Cases 1 and 3, 
and by (4), (8), (37), (38) and (46) in Case 2. The following refinements of 
Theorem 1 are available in Cases 1 and 2: 

TuHeroremM 2. (Cases 1 and 2). For each j = 1, 2, --- there exists a bounded 
(possibly constant) sequence cj, , Cj, +--+ such that, for any given positive integer k, 


pe” Cin Cain Ckin 1 
(6) p= aay ba [1+ St + 2 + ssp Be |p +0(2.)] 
asn— @. 

The sequences {c;,,,} are given explicitly for Cases 1 and 2 in Sections 3 and 4 
respectively. It would be interesting to know whether (6) holds in Case 3 as 
well, perhaps with the {c;,,} determined according to the formula for Case 1. 

Estimates in the form (5) or (6) were first obtained by Cramér [2, pp. 20-21} 
in the case when X, has an absolutely continuous component (so that Case 1 
obtains). Cramér showed that in the latter case (6) holds for every k (with b, 
and each c;,, independent of n), and determined b, . Our method of proof in the 
general case (cf. Sections 2-5) is essentially a variant or extension of Cramér’s 
method. Case 2 was treated recently by Blackwell and Hodges [3] by a different 
method. It is shown in [3] that (6) holds for k = 1 in Case 2, under the restriction 
on n and a that P(X, + --- + X, = na) > 0 for every admissible n, and the 
requisite b, and ¢,,, (which are then independent of n) are determined explicitly. 
Some other references bearing on the problem under consideration are [4], [5] 
and {6}. 

In the following Section 2 it is shown that p, can be expressed as p"/, , where 
I, is a certain integral; 0 < J, < 1, and J, = O(n) asn— ~. J, can be esti- 
mated by application of certain refinements [1], [7] of the central limit theorem. 
This estimation of /, is carried out in Sections 3, 4 and 5 for Cases 1, 2 and 3 
respectively. It may be added here that, as was pointed out in [2], direct applica- 
tion of the central limit theorem (or refinements thereof) to p, defined by (1) 
does not, in general, yield approximations g, which satisfy (2). 

In Section 6 we describe certain numerical approximations to p, which are 
suggested by Theorems 1 and 2 and their proofs. 


2. Lemmas. Let Y; = X, — a, and let F be the (left-continuous) distribution 
function (d.f.) of Yi1, F(y) = P(¥Y, < y). Let @ be defined by G(z) 
= fecyc.p e” dF(y). Since E(e"™') = ¥(r) = p, it is clear that G is a prob- 
ability d.f.. Let Z, be a random variable distributed according to G. 

Lemma 1. The m.gf. of Z, exists in a neighborhood of the origin. We have 


(7) E(Z,) = 0, 0 < Var(Z,) < ~. 
Proor. Let &(t) denote the m.g.f. of Z,. Then &(t) = ¥(r + t)/p for all ¢, 





DEVIATIONS OF THE SAMPLE MEAN 1017 


by (3) and the definition of Z, . Since ¥(t) < @ in a neighborhood of t = r, 
it follows that {(t) < © in a neighborhood of t = 0. Consequently, E | Z, |" < 
forr = 1, 2,3, --- and B(Z;) = {d't/dt} ».. In particular, E(Z,) = {dt/dt} wo = 
¥'(r)/p = 0, since ¥(t) is minimum at ¢ = +, and rf is in the interior of T. It 
remains to show that Var(Z,) > 0. Suppose to the contrary that Var(Z,) = 0; 
then P(Z, = 0) = 1; hence P(Y,; = 0) = 1, ie, P(X; = a) = 1, which is 
contrary to our assumptions. This completes the proof. 


Let Var(Z,) be denoted by o’. It follows from the preceding paragraph and 
(4) that 


2_¢’(r)_ 3 
(8) og er) a. 
Define 
(9) a = of, (0<a< @), 


Let Z, , Z2, -- + be a sequence of independent and identically distributed random 
variables. For each n, let 


(10) i, «i to 


; whe 
n nia 


and 
(11) H,(z) = P(U, < 2), (—-2 <z2< @), 


Lemma 2. p, = p" I, , where 
(12) I, = nba [eo [H,(2) — H,(0)) dz. 
Proor. Let Y; = X; — aforj = 1,2, --- ,n. Then 
pr = P(¥i+--- +Y,2 0) 
ff ara +: aru) 


vit---+a20 


ep” [- / ett) dG(2,) --- dG( 2.) 


2y4+++++en 20 


o" / ee" aH.(2) by (9), (10), (11) 


0g2<@ 


=p" It say. 
It follows by integration by parts that 7% defined in (13) is equal to J, , and this 


completes the proof. 
A theorem of Chernoff [4] states that p, < p” for every n, and that for any 





1018 R. R. BAHADUR AND R. RANGA RAO 


given positive po < p, we have p, = po for all sufficiently large n. A simple 
proof of Chernoff’s theorem can be given as follows. Since 0 Ss H,(x) — 
H,(0)S 1 for every n and z 2 0, we have J, < 1 and hence p, S p” for every 
n, by Lemma 2. To establish the second part of the theorem, we note first that 
lim,..H,(2) = ®(x) for every z, where 


(14) (zx) -[. (2x) te!” at (—-2e <r< ow), 


by (7), (10), (11) and the central limit theorem. Let « be a positive constant. 
Then 


a n'a | ee" [H, (x) — H,(0)) dz 


>(H.(e) — H,(0)] n'a | eter dy 


= [H,(e) — H,(0)) eo". 


Hence lim inf,.«{n™ log I,} = — ae. Since J, < 1 for every n, and since e is 
arbitrary, it follows that n* log J, = 0(1). Hence n™ log pn = log p + o(1), 
by Lemma 2, and this is equivalent to the conclusion desired. 

The preceding argument depends only on the central limit theorem. In the 
following sections we estimate /,, more accurately by substituting the expansions 
of H,(2) due to Cramér [1] and Esseen [7] in the right side of (12). The remainder 
of this section is concerned with preparations for this application of the Cramér- 
Esseen expansions. Almost all the considerations of the following paragraphs 
are well known, and we include them here only for the sake of completeness. 

Let n(w) denote the m.g.f. of Z,/c. According to Lemma 1, 7 < ~ ina neigh- 
borhood of w = 0. For j = 2, 3, --- let A; be defined by 


(15) Me = 4; Ay = (jlo’) *(d’/dt’) {log o(t)}- (7 = 3,4, ---). 
It should be noted that 7!A,; is the jth cumulant of the distribution of Z,/o. The 
m.g.f. of U, , with U, defined by (10), is 
[n(w/n!)|" = exp [n D7 d5(w/n')’), 
j= 


Clearly, [n( w/n*))* exp (—w’/2) is analytic in a domain independent of n, 
and can be expanded there as a power series in w. By regrouping the terms of 
this series according to powers of n we shall have 


(16) [n(w/n')\"e"? = Sn Pw) 
j=0 


where the P; are polynomials. P; is of degree 3j, and P; is even or odd according 
as j is even or odd. The first few polynomials are 





DEVIATIONS OF THE SAMPLE MEAN 
Pow) =w =1, 
Pi(w) = Ayu’, 
Pi(w) = dqw* + 4Ajw", 
Pi(w) = dAgw” + Adqw’ + AAW’, 
Py(w) = rew® + (4G + Asds)w + ADA" + AAgw". 


Write @° (x) = (xz) and &’ (x) = (d'/dz’)®(z) for r = 1, 2, --- , where ® is 
given by (14). Let P;(—®) denote the function of z obtained by replacing w’ 
with (—1)'’(z) in the polynomial P,;(w). It is clear that each P;(—) is 
absolutely continuous and of bounded variation in (— ©, ~ ). It should also be 
noted that P;(—) is square integrable with respect to Lebesgue measure. 

In the following, for any function K(x) of bounded variation in (—*, «), 
we denote the c.f. of K by x(t| K), ie., 


(18) x(t|K) = fe aK(z) 
for every real t. If K is absolutely continuous, x is, of course, (2x)' times the 
Fourier transform of K’. The reader may refer to (8, Chapters I-III] for such 
elements of Fourier transform theory as are used in this paper. 

Lema 3. For every j, t, and x 


(19) x(t| P(—®)) = P,(it)e™ 


(20) Pi(—) = (ax) [ ™P,Cit) d(0). 


Proor. As is pointed out in [1, p. 49], we have 
(21) x(t|@”) = (—it)’e™ 
for r = 0,1, --- . Suppose, for given j, that P;(w) = > °*. a,w’, where the a, 
and N are constants (depending on j). Then P;(—®) = ) >» a, (—1)'®(z); 
hence the left side of (19) equals 507 a, (—1)" x(t | ®”); (19) now follows from 
(21). The relation (20) follows from (19) by the inversion formula for the 
Fourier transform, since dP;(—®) = Pj(—) dz, and d®(t) = (2x) *e™ dt. 

A probability d.f. K(z) is said to satisfy condition (C) if 

lim sup;1)+« | x(t| K)| <1. 


In the following lemma the F; are arbitrary probability d.fs. 

Lemma 4. If F; satisfies (C), and if F, is absolutely continuous with respect to F; , 
then F, also satisfies (C). 

Proor. In this proof, for any probability d.f. K let K* denote the symmetrized 
df. defined by K*(x) = [®,. K(x + y) dK(y). We then have 


x(t| K*) - | cos (tr) dK* = | x(t| K) |’ 
for all ¢. 





1020 R. R. BAHADUR AND R. RANGA RAO 


Suppose, contrary to the lemma, that there exists a sequence {t; :7 = 1,2, ---} 
such that | t;|—> © and | x(t;| F2:) |—> 1 as j — @. It then follows from the 
above paragraph with K = F, that f*,,cos(t;) dF — 1. Hence cos(t@) — 1 
in F;-measure. Since F,-measure dominates F;-measure, it is easily seen that 
F7-measure dominates F?-measure. Consequently, cos(t;) — 1 in F?-measure. 
It now follows from the above paragraph with K = F, that | x(t;| F;) |? - 1 
as j — ©, which is impossible. This completes the proof. 

We conclude this section with a description of the functions S,(z), S.(z) which 
occur in the Euler-Maclaurin sum formulae, and which are required in the 
analysis of Case 2. It is convenient to define S, as follows: 


(22) Si(z) =4—2 for 0<2 
For j 2 2, S; may be defined as 


s 1; Si(a + 1) = S,(z). 


os 5 cos cos (2xrz) (joven) 


(xr)? 
(23) S;(z) = 
1 sin (2erz) 
2) F=1 (xr)? 


(j odd). 


Each S; is a bounded and periodic function; S; is absolutely continuous for 
j 2 2; and at each non-integral z we have 


(24) Si(z) = -1, Sjus(z) = (-1)'8j(z) (Gj =1,2,---). 


3. J, in Case 1. Suppose that the d.f. of X, satisfies (C). Since Y; = X, — a, 
it is plain that F, the df. of Y; , also satisfies (C). It is easily seen that F and G 
(the d.f. of Z,) are absolutely continuous with respect to each other. It therefore 
follows from Lemma 4 with F; = F and F, = G that G also satisfies (C). 

Let k be an arbitrary but fixed positive integer. It follows from the conclusion 
of the preceding paragraph by Cramér’s theorem [1l, p. 81] that H,(z) = 
K,(x) + R,(z), where 


(25) K,(z) = > P;(-#) 


and R, (2) is of the order n~“**”” uniformly in z. It follows hence from (12) that 
(26) I, = nla [° o** [Ka(z) — Ka(0)] dr + O(n), 

We have 
(27) x(t|K,) = > n P,(it) 


by (19) and (25). Let f.(z) = exp (—n'ar) for x = 0 and f,(z) = 0 other- 





DEVIATIONS OF THE SAMPLE MEAN 1021 


wise. Then f*.e'“f,(z) dz = 1/(n'a — it) = g,(t) say. Consequently, by 
first using integration by parts and then Parseval’s formula, it follows that 


n'a [ en [K,(x) cs K,(0)] dz = [ e'"K' (2) dz 


(28) 
” yf l éiienien 
= [4 Ki(z) dz = £ [ aalx(t| Ka) at 


It follows from (26), (27) and (28) that 


(29) a(2en)' I, = Pe ( = r (> n P,(it)) d&(t) + O(n™). 
o=@D j=0 


Define 
(30) bra =f Cit)’ PACit) ao(1) (r, 8 = 0,1,2, -++). 
Since P, is an even [odd] polynomial if s is even [odd], and since f*,, ¢’** d®(t) = 0 
for j = 0,1, 2, --- , it follows that each u,,, is a real constant, and that 
(31) Mee = O if r + 8 is odd. 
Now, (1 + itn a) = Sosres (—itn ta)’ + ni t* on(t)y where | w | is 


bounded in n and ¢. Since # has finite moments of all orders, it therefore follows 
from (29), (30) and (31) that 


(32) a(2en)'I,= > wtf X,, (- y ba} + O(n), 


Osi<he 


Since p, = p" J, , and since uo» = 1, it follows by replacing k with 2k + 2 in 
(32) that (6) holds for any given k, with 


(33) b =a” 


and 


(34) was fas (- 3) Bris (j = 1, 2, ax -) 


a 


for every n. This establishes Theorem 2, and hence also Theorem 1, in Case 1. 
It follows from (17) and (30) that the coefficients yu,,, required to compute 
Ci,, according to (34) are 


ao = —1 


(35) Mia = 3A 


i 
mas = 3M — Ni, 





1022 R. R. BAHADUR AND R. RANGA RAO 


where the dA; are given by (15). Similarly, c2,, can be computed from 
3 
— 15d; 
—15d4 + 1055 


—15r5 + 105.0, — ~ x 


945 1039 
= —15ds + 105(4X4 + Asds) — > Maa + _ dj. 

We conclude this section with a remark concerning the role of Cramér’s 
theorem [1, p. 81] in the preceding argument. Suppose that H, is absolutely 
continuous, and that H, is square integrable over (— ~, «). It then follows, 
by integrating (12) by parts and using Parseval’s formula, that 


ry : -1 . n 42 
(29*) a(2en)' I, = fi ¢ + ta) {{9(33) bao 


where 7 is, as before, the m.g.f. of Z,/c. (The square integrability condition is 
imposed here for the validity of Parseval’s formula, and can be replaced by 
others, e.g., that (1 + @)*| n(it) | be integrable). According to (16), the fune- 
tion in curly brackets on the right side of (29*) can be expressed as 
Dio n™P,(it). By comparing (29) and (29*) it is seen that, from a technical 
point of view, the role of Cramér’s theorem in the present special case is to 
guarantee that when ) 7. n™’P; is replaced by >-j-3 n™’P; on the right side 
of (29*), the error introduced is indeed of the order n™**. The same remark, but 
with (29*) replaced by a rather different formula for J, , applies to the role of 
Esseen’s theorem in the argument of the following section. 


4. J, in Case 2. Suppose that X, is a lattice variable. Let d be the maximum 
span of X, , i.e.,d > 0 is the g.c.d. of the differences between consecutive possible 
values of X,. Let x be the number such that a S a < a + d, and such that 
the possible values of X, are included in the set {a + rd:r = 0, +1, +2, ---}. 
Let 


(37) 8 = d/a, y = rd, xk = (x9 — a)/d 
It should be noted that 0 < « < 1. For each n, let 


(38) 6, = nk — [nx], 0s 6, <1, 


where [zx] denotes the greatest integer contained in z. 

Let k be an arbitrary but fixed positive integer. It follows from Esseen’s 
theorem for the lattice case [7, p. 61] that H,(x) = K,(x) + L,(x) + R,(2), 
where K,(z) is given by (25), R, is of the order n™““*”? uniformly in z, and L, 





DEVIATIONS OF THE SAMPLE MEAN 1023 


is defined as follows. For any j = 1, 2, --- let h; = 1 if 7 = 1 or 2 (mod 4) and 
h; = —1lifj = 0 or 3 (mod 4). Then 


k & 
(39) La(z) = Son h, 8’ S,(n's'x — 0,) Ky? (x) = » M;j,.(x) say, 


where KY” is the jth derivative of K, . It follows hence from (12) that 
: nal e"" (K,(z) — Ka(0)]dz 
(40) . 
+ Dina [ oo 1M y.n(t) — Mjn(0)] de + O(n), 
j=l 


The first term on the right side of (40) is (cf., (28)) equal to ff eK (2) dz. 
We observe next that, for 7 2 2, 


n'a [ e tte [M;..(x) a M,,,,(0)) dx = [ eM’. (x) dx 


= nh, 6 I eo 18,(y,) KY*”(z) 
+ (=1)* nla Sy1( yu) KP (x)] dx 


=n nel e*S)(y.) KY? (2) de 


= 9 a BY [eS yu) KS2) de 


=Nin — Nja.n (say). 


In (41), we have put n's'x — 6, = y,, and used integration by parts, (24), 
and the identity (—1)’ h; = hj. In order to evaluate the contribution of 
M,,,, to the right side of (40), suppose for the moment that 0 < 6, < 1, and let 


(42) f=0, ff =(r—1+0,)B/n' (r =1,2,---). 


Let A, denote the open interval (f{, , {-4:). Then S,(y,) is linear in z over each 
A, (ef. (22)), and its derivative there equals —n's'. By writing 


fe = Sc J.,, and applying integration by parts to f4,, it follows without 
difficulty that 


n'a [ e'* M, (2) dz = — [ot Ke) ae 
(43) 


+ Nin + Mi,.(0) + Bn? > 7 KM(5,), 
rel 


where y = af = rd (cf. (37)). Now, S;(z) is a left-continuous function of z. 
It follows hence that, for given n, the left and right sides of (43) are right- 





1024 R. R. BAHADUR AND R. RANGA RAO 


continuous in 6, . Since (43) holds for each 6, in (0, 1), we conclude that (43) 
is valid for 6, = 0 also. 

Since S, and K{**” are bounded functions, it is plain from the definition of 
Nj.» (ef. (41)) that N,., is of the order n™*~. It therefore follows from (40), 
(41) and (43) that 


(44) I, = Bn > 7 KM (¢,) + O(n). 
r=) 
Now, according to (20) and (25), 
« k 
(45) K2(¢,) = (2n)> | e *e (> n™P,(it)) d&(t) 
=) j=0 
for every r. Let us write 
(46) z=€ 7", by = [8/(1 — z)]2". 
It follows from (44) and (45) that 


1 iy — [* (1 —2) exp [—it@0,/n'] 
oe (den)'t. = « (1 — zexp |— ip/n')) 


(47) k 
(x n-¥ P,(it)) d®(t) + O(n™). 


For any @ and any j = 0, 1, 2, --~ let 


(48) (0) = = (G=25) 


(1 — ze) 
It then follows easily from (31) and (47) that 
(49) bz'(2en)' I, = 2d n?{ >> Bl,(On) wea} + O(n). 
<k/2 r+em2s 


os 


Jeno 


By replacing k with 2k + 2 in (49) we see that (6) holds for any given k, with 
b, given by (46), and 
(50) Cin = a B £(6,) Bre - 

r+em2j 
This establishes Theorem 2 in Case 2, and hence also the first part of Theorem 1. 
To complete the proof of Theorem 1 in Case 2, we see from (37), (38) that 
P(X, + --- + X, = na) > 0 implies 6, = 0. Consequently if P(X, = a) > 0 
then @, = 0 for every n, and hence b, = 8/(1 — z) for every n. 

It may be worthwhile to note that in the present case b, can be expressed as 
a ‘fye™"*” /(e’ — 1)], which shows that, in general, b, oscillates about the 
value a’ (cf. (33)) asm — © through the sequence 1, 2, --- . 

An alternative formula for the coefficients f; required in (50) is 


(51) (0) =(-1 > f {a — z) (4) (1 - ay}. 


T+enj = !s! 





DEVIATIONS OF THE SAMPLE MEAN 


From (51) it is easily seen that, with u = z/(1 — z), 
& = 1 
= —(0+ u) 
hh = 3{(0 + u)’ + u(1 + u)} 
= —4{(6 + u)® + 3u(1 + u)O + ull + u)(1 + 5u)} 
fe = px { (0 + u)* + 6u(l + u)6 + 4u(1 + u)(1 + 5u)e 
+ 23u‘ + 36u? + 1407 + uj. 


The coefficients c;,, and c:,, can be computed from (35), (36), (50) and (52). 
The formulae for b, and ¢,,, with 6 = 0 agree with the results of [3]. 


5. J, in Case 3. If X; is not a lattice variable, then neither is Z, . It follows 
hence from a theorem of Esseen (7, p. 49] that H,(z) = (xz) + n“f(x) + 
nr,(x), where f(z) = (const.) (1 — z*) exp (—42”*), and r,(z) — 0 uniformly 
in z as n —> ©, The contribution of nf to I, is n fe e"'“f"(x) dz, which is 
easily seen to be of the order n~. It follows that 


I, = n'a er [(x) — &(0)] dx + o(n™) 


= [ eu ® (x) dx + o(n) 


= "(1 — &(n' a)] + o(n) 
= (2en) a’ + o(n*), 


In (53), we have used integration by parts, a linear change of variable, and the 
leading term of the asymptotic formula [9, p. 179] 


(54) 1— (x) = (24) te {rt — c* + 32* + O(2")} asz— @. 
It follows from (53) that (5) holds, with b, = a’ for every n. This completes 
the proof of Theorem 1. 

Since (xz) + n“*f(xz) = K,(x), where K, is defined by (25) with k = 1, the 
conclusion of the preceding paragraph is also available from the argument of 


Section 3. We have used a direct calculation instead because this calculation 


suggests the form of the numerical approximations described in the following 
section. 


6. Concluding remarks. Suppose, in a given case, and for given n and a, that 
it is required to compute the numerical value of p, defined by (1). In this section 
we consider approximations of the form 


(55) qn = p" e™*((1 — O(,)), 
where p and @ are defined by (4) and (14), and », is a suitably chosen number. 





1026 R. R. BAHADUR AND R. RANGA RAO 


We shall describe four choices of v,, called i: oo, and v®’. The resulting 
* 
values of g, are denoted by q, , q2”, ete. 
First consider 


(56) v. = n'a 

where a is given by (4), (8) and (9). This choice of v, amounts (cf. (53)) to 
approximating J, by replacing H, with ® on the right side of (12). It therefore 
follows from the Esseen-Berry theorem that we always have 

Z; |’ 


“ , " E | 
(57) | px — qn | $205 


og 


where C is a universal constant. Wallace [10, p. 637] states that C s 2.05. 
Next, consider 


(58) vo” = n'/b, 
where b, is defined by (33) in Cases 1 and 3, and by (46) in Case 2. (Of course, 


qs = qa, in Cases 1 and 3). Then g°” satisfies (2), and the o(1) term in (2) is 
known to be of the order n™ in Cases 1 and 2. Finally, let c;,, be defined accord- 


ing to Section 4 in Cases 1 and 3, and according to Section 5 in Case 2. Define 
(59) ve = vf (1 — (BA + e1.n)/n] 


if the expression within the square brackets is positive and v°)’ = 0 otherwise; and 


(60) ve? = vf? [1 + (Oh + cin — Ditin — C2n)/n’] 


if the expression in square brackets is positive and v,” = 0 otherwise. Then 
qs’ also satisfies (2), and o(1) = O(n-*") in Cases 1 and 2 (j = 1, 2). 
The stated theoretical properties of the approximations g¥’ are easy conse- 
quences of (5), (6), (54), and (58). 

Although (unlike g,) the approximations gY are derived from asymptotic 
expansions corresponding to the case when n — ~ and a is held fixed, the useful- 
ness of these approximations may be wider than is suggested by the derivation. 
Some evidence to this effect is provided by the fact that if X, is normally dis- 
tributed then p, = gS” = qi? = gq for every admissible a and every n. 


REFERENCES 

{1] H. Crate, Random Variables and Probability Distributions, Cambridge University 
Press, 1937. 

{2} H. Cramér, “Sur un nouveau théoréme-limite de la théorie des probabilités,’’ Actu- 
alités Scientifiques et Industrielles, No. 736, Hermann C’*, Paris, 1938. 

(3] Davin BLackweE.t anv J. L. Honags, ‘‘The probability in the extreme tail of a convo- 
lution,’”’ Ann. Math. Stat., Vol. 30 (1959), pp. 1113-1120. 

[4] Herman Cuernorr, “A measure of asymptotic efficiency for tests of a hypothesis based 
on the sum of observations,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 493-507. 

[5] V. V. Perrov, ‘“‘Generalization of Cramér’s limit theorem,’’ Uspekhi Mat. Nauk. 
Vol. 9, No. 4, (1955), pp. 195-202. (In Russian). 





DEVIATIONS OF THE SAMPLE MEAN 1027 


[6] R. R. Banapur, “Some approximations to the binomial distribution function,’ Ann. 
Math. Stat., Vol. 31 (1960), pp. 43-54. 

[7] Cart Gustav Esseen, ‘‘Fourier analysis of distribution functions,’’ Acta Mathematica, 
Vol. 77 (1945), pp. 1-125. 

[8] E. C. Trtcumarsn, Introduction to the theory of Fourier Integrals, Oxford University 
Press, 1937. 

(9) Wiiuram Feuer, An Introduction to Probability and its Applications, Vol. I, 2nd Ed., 
John Wiley and Sons, New York, 1957. 

{10} Davin L. Watiace, “Asymptotic approximations to distributions,’””’ Ann. Math. 
Stat., Vol. 29 (1958), pp. 635-654. 





ON THE INDEPENDENCE OF A SAMPLE CENTRAL MOMENT 
AND THE SAMPLE MEAN’ 


By R. G. Langa, E. Luxacs anp M. NewMAn 


The Catholic University of America and National Bureau of Standards 


1. Introduction. Let X,, X., --- , Xw be a random sample of size N (inde- 
pendently and identically distributed random variables) from a population with 
distribution function F(z). It is known that the population can some- 
times be characterized by the independence of a suitable statistic’ S = 
S(X,, X2,---, Xw) and the sample mean X = 5°¥_, X,/N. If S is a poly- 
nomial statistic then the independence of S and X yields a differential equation 
for the characteristic function of F(z). In order to determine F(z) we must 
study this differential equation and find all its positive definite solutions. In the 
case of certain polynomial statistics, such as the k-statistics or quadratic poly- 
nomials, it is comparatively easy to obtain all positive definite solutions of this 
differential equation. In many cases however, this procedure is not feasible since 
it is often very difficult to decide whether a given function is positive definite. 
If we consider, for example, a normal population then any central sample moment 
m, = >.}-, (X; — X)”/N and the sample mean X are independent. But, when 
we investigate whether this property characterizes the normal population for 
p > 3, then it is practically impossible to determine all positive definite solu- 
tions of the corresponding differential equation. 

In the present paper we prove the following theorem. 

Tueorem. Let X,, X2, +--+ , Xw be a sample of size N from a certain popula- 
tion. Let p be a positive integer such that (p — 1)! is not divisible by N — 1. The 
population is normal if and only if the sample central moment m, of order p is 
distributed independently of the sample mean X. 

Remark. The condition that (p — 1)! is not divisible by N — 1 is satisiied 
ifN > (p—1)!4+1. 

For the proof of this statement we use a theorem which was recently derived 
by Linnik [1] and Zinger [2]. 

In Section 2 we derive two combinatorial lemmas which are essential for the 
proof of the theorem. In Section 3 we give some analytical results and deduce 
finally the theorem in Section 4. 


2. Combinatorial lemmas. Let x), 2; , --- , %, be nm + 1 real variables. Sup- 
pose that 


(2.1) P = P(x, %,°**, En) = Dy Ajejyesgth'si' +++ xf 


Received November 28, 1950; revised March 11, 1960. 

' The research of the first two authors was supported by the National Science Founda- 
tion under grants NSF-G-4220 and NSF-G-9968. The work of the third author was sup- 
ported, in part, by the Office of Naval Research. 

2A statistic is a real, single valued and measurable function of the observations 
Xi 2 Xs g *F% Xy ° 


1028 





INDEPENDENCE OF CENTRAL MOMENT AND MEAN 1029 


is a polynomial of degree p with real coefficients. Here and in the following the 
summation }-* is extended over all non-negative integers jo, j:, --- , jn which 
satisfy the condition jo + j; + --- + Jj. = p. If we replace each zj* by »” = 
v(v — 1) «++ (» — je + 1) in the polynomial P, we obtain a polynomial 


(2.2) = r,(¥) = yy A jeis*+4e py 0) ove yt) 


of degree p in the real variable y. We write here »” = 1. The polynomial «,(») 
is called the adjoint polynomial of P. 


In this section we study the adjoint polynomial when P has a special form. 
Lemma 2.1. Let 


P = P(x%,%,°** » 2a) - ln - 2), 


where £ = SP» x/(n + 1). The adjoint polynomial of P is then 


role) = Ares =") (5): 


Proor. We note that 


(2.3) (ze . #)? = ao = = aie (p; Jo, 7 » jn) (—n)*xi! soe wh 


Here (p; jo, -** jn) = pl/(jol --- ja!) is a multinomial coefficient. It follow 
from (2.3) that $ 


ans (n aed (p; jo -** jn) (—n)*ag* +++ x 


= fair »(:)---(2) 
. “GEDA (- “—— Jo jn} 


oE0r(3) (3) 
so that 


(2.4) rp = (—1)"ple,/(n +1)””. 
It is easy to verify that 


(2.5) S92” = (1 — nz)’(1 + z)”™. 
Thus 


(2.6) c= > (—n)’ (”) Ce .): 


Lemma 2.1 follows immediately from (2.4). 





1030 R. G. LAHA, BE. LUKACS AND M. NEWMAN 


Lemma 2.2. Let 


P = P(m,%1,°**,%n) = Dy (te — 2)”, 
where 2 = (n+ 1) > ’2o um. If (p — 1)! is not divisible by n, the adjoint poly- 
nomial x,(v) of P has no non-zero integer roots. 
Suppose that for some integer v, vy * 0, we have x,(v) = 0. Then, for that 
value of v, c, = 0 and, according to (2.6), 


(5) ~*G) G7 1) - *G) G72) + + 9G). 


Thus, multiplying by p! and cancelling the common factor nv, we find that 
(nv — 1)(nv — 2) --- (nv — p+ 1) = 0 (mod n) 
so that (p — 1)! = 0 (mod n). Lemma 2.2 follows immediately. 


3. Some analytical results. Let P(x; , x2, --- , Zy) be a polynomial of degree 
p = 1. We say that it is an admissible polynomial if the coefficients of the terms 
z?(j = 1,2, --- , N) are not zero. 

We state the following lemma which is due to Zinger [2]. 

Lemma 3.1. Let X, , X2, --- , Xw be a sample of size N from a certain popula- 
tion. Let P = P(X,, X2, +++ , Xw) be an admissible polynomial statistic and let 
A = 5%, X;. If P and A are independently distributed then the common charac- 
teristic function f(t) of the X,, X2,--- , Xw is an entire function of finite order. 

Lemma 3.2 (Theorem of Marcinkiewicz). Let P,,(t) be a polynomial of degree m 
and suppose that f(t) = exp [P,,(t)| is a characteristic function. Then the degree m 
of P(t) cannot exceed 2. 

Lemma 3.3. Suppose that the conditions of Lemma 3.1 are satisfied and that the 
characteristic function f(t) has no zeros in the entire complex plane. Then the 
population is normal. 

The function f(t) is an entire function of finite order m without zeros. Ac- 
cording to Hadamard’s factorization theorem, f(t) = exp [P,,(t)]. The state- 
ment of Lemma 3.3 then follows from the theorem of Marcinkiewicz. 

Before proceeding further we introduce a special class of polynomials. Let 


j in 
P(x, 22, ss » tn) - x A jyoo-ig hh’ |? oe 
jit++-+inep 


be a polynomial of degree p. It can be written as the sum 
P(x, 22, +++, tw) = Pola, %2,°-+, tv) + Pil(ti,m,°--, 


where 


3 JN 
Po(ai, 2, °** tw) = . +e A jy.00jyT1 °° * Sy 
Jit’**+In=P 


is a homogeneous polynomial of degree p, while P;(2; , 22, --- , Zw) is a poly- 





INDEPENDENCE OF CENTRAL MOMENT AND MEAN 1031 


nomial of degree less than p. We say that the polynomial P(2z,, 2, --- , tw) 
is non-singular if the following two conditions are satisfied: 

(i) P(a, %,-*-**, Zw) contains the pth power of at least one variable. 

(ii) The adjoint polynomial x,(v) of Po(z,, 22, +--+, Zw) does not have a 

positive integer root. 

For the proof of the theorem we need the following lemma. 

Lemma 3.4. Let X,, X:,--- , Xw be a sample of size N from a certain popula- 
tion. Let P = P(X,, X2,+--, Xw) be a non-singular admissible polynomial 
statistic and let A = >~¥_, X;. If P and A are independently distributed, then the 
population is normal. 

Lemma 3.4 is due to Linnik [1]. In his paper Linnik made the additional 
assumption that the population distribution function has moments up to a 
certain order. In view of Lemma 3.1 (due to Zinger) this assumption is super- 
fluous. Since Linnik’s article [1] is not easily accessible while Zinger [2] only 
states (a somewhat generalized version) of Lemma 3.4 without proof, we give 
here its derivation. 

Since P and A are independent, we conclude from Lemma 3.1 that the com- 
mon characteristic function f(z) of the random variables X,, X:,--- , Xw is 
an entire function of finite order. The relation 


(3.1) &(Pe™*) = &(P)&(e**) 


holds for all complex z(z = ¢ + iv; t, v real). First we show that the function 
f(z) has no zeros in the entire complex plane. We write 


f? = f(z) = (d’/dz’)f(z) = e(X’e"*) 
and note that f(z) = f(z) = f. We see from (3.1) that 
(3.2) > Mgectyf ™ — f* - Cif(z)}", 


Jit-++*+InaP 
where C = 7°8(P). 

We give an indirect proof and assume therefore that the function f(z) has 
zeros. Let the point z = zp be one of the zeros of f(z) which are nearest to the 
origin and denote the order of the zero z by »(» a positive integer). We show 
that this assumption leads to a contradiction. 

Since f(z) does not vanish in the circle | z| < | z|, we may divide (3.2) by 
[f(z)]* and obtain 


(3.3) Ro + R, = C, 
where 
fp arte f*" 
(3.4) Aj,---4g —S— 
‘ f* 


and 


i" 





1032 R. G. LAHA, E. LUKACS AND M. NEWMAN 


Let ¢ = ¢(2) = Inf(z). It is then easily verified that 
(3.5) f/f =¢? + O¢', 0", -+-,¢F”) (j = 1,2,---) 


where 6; is a polynomial in ¢’, y”, --- , ¢'. We also write ¢® = 1 and & = 0. 
We substitute (3.5) into (3.3) and get, for |z|< | z|, 


(3.6) So+ S, = C, 
where 


(3.7) So= Do Ajeniyle'? + 65) +--+ [6 + Oy] 


Jite+-+in=p 


and 


S= DY Ajnjyle™ + 05) --+ (0 + Ojy). 


dite +in<p 


Since 
(3.8) f(z) = (2 — %)’9(z), 
where g(z) is an entire function and g(z) # 0, it is easy to verify that 
¢'(z) = »/(z — a) + hi(z), 
and, in general, that 
(3.9) (2) = ((-1) 7" G — I) We — 2) 1+ Adz) = = 1,2,---). 


The functions A,(z) are regular at the point z = z . We substitute (3.9) into 
(3.6) and see that 


(620) 0 an + 5-14 +s + + H(e) = 6, 
J 0, 


(2 — 2)? © (2 — %)? (z — a) 
where H(z) is regular at the point z = z . We show next that y, ~ 0 and note 
that relation (3.10) leads therefore to a contradiction. 

We remark that 7, depends only on » and on the coefficients of the homo- 
geneous polynomial Po(X,, X:,--- , Xw). We see that y, is the coefficient of 
(z — 20)” in the expression which we obtain by substituting (3.9) into Sp in 
(3.7). We get the same value for the coefficient of (z — z)~? if we substitute 
(3.8) into Ro in (3.4). Since f(0) = 1 we see from (3.8) that g(0) = C, # 0. 
We note that y, is also the coefficient of (z — z)~” in the expression obtained 
by substituting y(z) = Ci(z — 2)” instead of f(z) into the right-hand side of 
(3.4). We get 


v2 (2) = Cw? (z — x) 4, (j = 0,1,2,--+,»). 


Therefore 





INDEPENDENCE OF CENTRAL MOMENT AND MEAN 1033 


Thus y, = +,(v), the adjoint polynomial of P, . Since P is a non-singular poly- 
nomial x,(») does not vanish for any positive integer » so that y, ~ 0. This 
leads to the desired contradiction in (3.10) so that f(z) has no zeros. 

The proof of Lemma 3.4 follows then from Lemma 3.3. 


4. Proof of the Theorem. We show first that the condition is sufficient. It 
follows from Lemma 2.2 that the sample central moment m, is a non-singular 
polynomial statistic if (p — 1)! is not divisible by N — 1. Hence the theorem is 
an immediate consequence of Lemma 3.4. The necessity of the condition fol- 
lows from the well-known fact that in a normal population any translation- 
invariant statistic is independent of the sample mean. 


REFERENCES 


[1] Yo. V. Linnix, “On polynomial statistics in connection with the analytic theory of 
differential equations,’’ Vestnik Leningrad Univ., Vol. 11, No. 1 (1956), pp. 
35-48. (Russian) 

[2] A. A. Zincer, “Independence of quasi-polynomial statistics and analytical properties 


of distributions,’ Teoriya Veroyatnostey i yeye Primeneniya, Vol. 3 (1958), pp. 
265-284. (Russian) 





ORDER STATISTICS OF PARTIAL SUMS' 


By J. G. WENDEL 
University of Michigan 

1. Introduction. In recent years there have emerged a number of remarkable 
identities, which connect the distributions of certain quantities arising in the 
fluctuation theory of sums of independent random variables with the distribu- 
tions of much simpler quantities depending only on the individual partial sums. 
Authors contributing to the theory include Baxter [1], [2], Pollaczek [4], Sparre 
Andersen [5], [6] and Spitzer [7]. 

The present paper is a further study along these lines; its chief aim is to link 
up certain results of Spitzer and Pollaczek, to be described in detail in the next 
section. The method used is an extension of that presented in [8] and may be 
described as algebraic, in contrast to Spitzer’s combinatorial approach and the 
function-theoretic treatment by Pollaczek. A similar algebraic approach has 
been developed by Baxter. 

An outline of the paper is as follows. In Section 2 we collect definitions and 
state the main results. These all stem from a fundamental integral equation, 
whose derivation is the theme of Section 3. The algebraic tools required to treat 
the integral equation are developed and applied in Section 4. In Section 5 we 
give the proofs of the results stated in Section 2 and make some additional re- 
marks. In Section 6 certain formulas for continuous-time additive processes are 
obtained by a passage to the limit. 


2. Definitions and chief results. Let |X,} be a sequence of independent 
random variables with common distribution function f(z) = Pr {|X < z} and 
characteristic function ¢. Let {S,} be the sequence of partial sums, with Sp = 0. 
For a real number z let N,(2) denote the number of 8, ,0 S k S n, that ex- 
ceed or equal z, and write N, = N,(0 + 0), the number of positive partial 
sums among the first n. The order statistics of the n + 1 quantities Sy), S,,---, 
S, are designated by R,o 2 Raa 2 --- 2 Ra, and it is convenient to write 
R, = maxogesn Si = Rao and R, = min S& = R,,,. For a real number z 
write x«* = max (z, 0), 2 = min (2, 0) and e(x) = 1 or O according as z is 
positive or not. Note that the order statistics of Sj, 0 < k S n, are precisely 
the numbers Ri,. If R and S are random variables we write 


E(exp i[pR + oS}) = cf (R & S), 


suppressing the real arguments p and o on the right, as may be done without 


Received February 9, 1960. 

! A preliminary version of this paper [9] was contributed to the Evanston meeting of 
the American Mathematical Society, November 29, 1958. A later version was presented at 
the Purdue meeting of the Institute of Mathematical Statistics, April 7-9, 1960, as an 
Invited Paper. The author is grateful to USASADEA for partial support during this work. 


1034 





ORDER STATISTICS OF PARTIAL SUMS 1035 


ambiguity; similarly cf (R) is understood to be the characteristic function of R, 
with argument p. Finally, for a random variable Y and event A we write 


E(Y;A) = E(YI,), 


where J, is the indicator of A. 

We can now state the results. The first theorem concerns relations between 
generating functions of certain characteristic functions. 

THEOREM 2.1. The following identities hold, if | w| and | z| are less than one: 


> w" = 2 cf (Raw & S,) 
nom km 


(2.la) ti 
= ex { > n'[w" cf (Si & S,) + (wz)" ef (Sz & s\} 


n 
k 


w" Dez ef (Rt, & S,) = {(1 — z)(1 — wee(e))}" 
(2.1b) 


; ep] Yaw" —2")ef (SlL& s.)| — } 


(2.1¢) yw ef (N, & S,) = exp p> nw" cf (ne(S,) & S,). 


The above remain true when the second argument, S,, , is deleted from all cf symbols, 
i.e. when o = 0. 

Note that the cf’s appearing on the right are in principle determined solely 
by the distributions of the individual S, ; thus, for example, 


cf (ne(S,) & S,) = [exp (ipn) — 1JE(exp(icS,); 8S, > 0) + ¢(c)". 


Formula (2.1b) provides the link between the results of Spitzer and Pollaczek 
alluded to in Section 1. Spitzer [7] proved the case of (2.1b) in which z = 0, 
namely, the identity 


yw" ef (Rt & S,) = exp nu" ef (St & S,), 


and deduced many interesting consequences. Pollaczek [4] obtained a version 
of (2.1b) in which S, is omitted (i.e. ¢ = 0), in the case where X,, has a moment 
generating function. (His formula differs in appearance from the present one, 
in that he defined Ri, = 0 for k > n and carried the inner summation to 
k = «, and he obtained the analogous exponent on the right-hand side of the 
equation as a contour integral. However, the two expressions are readily shown 
to agree. ) 

Two identities are stated in the next theorem. The first of these enables the 
joint distribution of an order statistic and the corresponding partial sum to be 
calculated from the joint distributions of extreme values and partial sums, 
while the second can be viewed as relating the conditional distribution of S, , 





1036 J. G. WENDEL 


given that a specified number of partial sums are positive, to conditional dis- 
tributions given that all (resp. none) are positive. (See also equations (5.6) 
below. ) 

THEoreM 2.2. The relations G,.4 = Gn—z,00r% hold when either 


(2.2a) ank = ef (Ran & S,) 
or 
(2.2b) a, = E(exp (ioS,);N, = k). 


The case (2.2a) [with o = 0] was proved combinatorially by Bohnenblust, 
Spitzer and Welch, and was brought to the author’s attention by Spitzer; see 
also Theorem 5.1 below. Sparre Andersen [5] proved (2.2b), also with o = 0. 


3. The integral equation. Let z be a complex number, |z|< 1, and let ¢ be real. 
Define 


ha(z) = ha(z, 2,0) = E(2"*” exp (ioS,)). 


The functions A,(z) are left-continuous and have total variation on 
— «© <2z< © not exceeding one. Then 


Gn(z) _ ha(z) = ha(— ©) 7 h,(x) = 2"*'9(c)" 


can be considered as the distribution functions of uniformly bounded complex- 
valued measures g, on the real line. 

A recurrence relation for h, can be easily obtained by calculating E(---) 
as E(E(---|X,)), and using the facts that if X, = y then 


Nan(z) = 1 — e(z) + Ni(z —y), Sau = Ss + y, 


where the starred quantities depend on X, , X;, --- , X,4: in the same fashion 
that the original ones do on X,, X2,---, X,. Setting 


t(z) = 2°” = e(x) + (1 — e(z))z 
we have the immediate relations ho(z) = t(z), 
hass(z) = tz) [hala — y) exp (iay) dfty). 


In terms of the g, this becomes 
(3.1) g(x) = (x) — z = (1 — z)e(z) 


(3.2) — gau(2) = (1 — 2)e"p(o)™" + (2) [gale — y) - exp (éey) diy). 


For |w| < 1 let g(z) = 50%. w"g,(x), the generating function of the g,. It 
is again left-continuous and of bounded variation. Multiplying (3.1) by w*™, 





ORDER STATISTICS OF PARTIAL SUMS 


summing on n = 0, 1, ---+ and using (3.1) we obtain 


(3.3) g(x) = ce(x) + wlt(zr) [ g(x — y) exp (ioy) df(y) 


where for brevity we have written ec = (1 — z)(1 — wze(e))~*. This is the basic 
integral equation. (The discontinuous factor ¢(z) gives it a kind of ““Wiener- 
Hopf” flavor; however, we shall not discuss (3.3) in that context. The author 
has recently learned that Widom’ has treated an equation similar to (3.3) by 
Wiener-Hopf techniques in order to obtain new probabilistic limit theorems. ) 


4. Algebraic considerations. Let X& be a commutative Banach algebra with 
identity e. For f ¢ X the elements exp f, (e — f)™ and log (e — f)} are defined 
by their Maclaurin series, the latter two only when | f || < 1. Then 


exp (fi + fe) = (exp fi) (exp fe) 


and (e — f)™ = exp {+log (e — f)}. Let P be a bounded linear operator on 
x and let f be a given element of %X. In order to discuss operations of the form 
P(fg), PUf(Ptfo})], «++ , acting on g ¢ &, it is convenient to introduce the oper- 
ator F, which sends g into fg. Then alternating multiplications by f and opera- 
tions by P can be simply expressed as powers of the operator PF. If g is an un- 
known element of X satisfying the equation g = e + P(fg) then, leaving aside 
questions of uniqueness, existence and convergence for the moment, the solution 
ought to be given by g = >. (PF)"e, or, more compactly, byg = ‘I — PF)~e, 
where of course J is the identity operator. More generally, we ar. “oing to want 
to solve an equation of the form 


(4.1) g = ce + P(fg) + (I — P)(fa), 


where c is a constant and f, , f, are given. The following theorem gives a special 
set of sufficient conditions under which g can be found explicitly. As shown by 
Baxter, at least when f, = 0, the conditions are stronger than necessary, but 
adequate for our purposes. 

TueoremM 4.1. Let P be a projection on %, i.e. FP? = P, such that PX and 
(I — P)& are subalgebras, then necessarily closed in norm. Suppose that the 
elements f; , f2 and the operator PF, + (I — P)F, have norms less than one. 
Then (4.1) has the unique solution 


(4.2) g = cexp [—P log (e — f,) — (I — P) log (e — fa)). 


If ¥ is a continuous homomorphism of X to complex numbers, i.e. a multiplicative 
linear functional on %X, then 


(4.3) (9g) =e exp 2. n'y Phi + (I — P)f?). 


Proor: It is plainly enough to consider the case c = 1. If Q is any operator 





1038 J. G. WENDEL 


such that || Q || < 1 it is well known that (J — Q)~ exists as a bounded opera- 
tor. It follows that g = (J — [PF; + (J — P)F,.])~e is the unique solution 
to (4.1) (withe = 1). Let A denote the right member of (4.2). Then (e — fi)h = 
h exp log (e — fi) has the form exp (J — P)h,. Expanding the exponential 
function and using the fact that (J — P)X is a closed subalgebra it follows that 


(4.4) (e—fih=e+ (1 — P)h 
for some hz, ¢ X. In a similar way we have 
(4.5) (e — fe)h =e + Phy 


1. ¢ some A, ¢ X. Apply P to both sides of (4.4) and (J — P) to (4.5), then 
add the resulting equations. The result is 


Pl(e — fi)h] + UZ — P)[(e — fa)h] = 


which on rearrangement shows that g = h satisfies (4.1), thereby proving (4.2) 
Equation (4.3) follows at once, being included in the statement only for ease 
of reference. 

The next theorem gives an explicit formula for Pg in the subcase that we will 
actually confront. 

THEOREM 4.2. If in addition to the hypotheses of the previous theorem we have 
Pe = e and f, = 2f, for a scalar z # 1 then 


(4.6) Pg = c(1 —z) {exp —Pflog (e — f,) — log (e — fe)] — ze}; 


if ¥ is a homomorphism on PX then 
(47) Wa) = (1 ~ 2)" Jexp [> n(1 — owen) | - 2. 
n= / 
Proor: We write (4.1) (with c = 1) in the form 
(4.8) g=e+ ag + (1 — z)P(fg). 
Applying P to (4.8) we obtain 
(4.9) Pg =e + P(fg); 


then eliminating P(fig) between (4.8) and (4.9) yields (4.6). Relation (4.7) 
is immediate. 

We shall now show that the integral equation (3.3) can be viewed as an in- 
stance of the algebraic equation (4.1), the conditions of Theorems 4.1 and 4.2 
being satisfied; temporarily, however, we have to restrict the moduli of w and 
z somewhat more severely than required in the derivation of (3.3). 

Let X be the algebra of bounded complex-valued measures on the line, with 
convolution as product operation and norm equal to total variation. X& has 
identity element e, representing a unit mass at the origin; of course the distribu- 
tion function corresponding to ¢ is the function e(z) defined in Section 2. P will 
be the operation that throws all mass lying to the left of the origin onto the 
origin; expressed in terms of distribution functions P sends g(r) into e(z)g(r) 





ORDER STATISTICS OF PARTIAL SUMS 1039 


{ordinary multiplication !]. The measures of the form Pg are those whose support 
is contained in 0 S x < ~; the convolution of two such measures is easily 
seen to be another of the same kind. Thus PX is a sub-algebra. Similarly, 


(I — P)&x 


is a subalgebra, for its elements g are characterized by having —“ < z 3 0 


as support and g|(— «, 0]} = 0. Clearly Pe = e and || P || = 1. For a bounded 
measurable function k(x) we note the formula 


(4.10) [ k(x) d(Pp)(2) = [ k(2*) dg(2). 


Let w and z be complex numbers such that |z| < 1 and 
jwi{lz|+l{/1l—2|} <1. 


Let f, be the measure whose distribution function is 


z—0 
f(x) = wf exp (icy) df(y). 


It is easy to see that the n-fold convolutions of f, and f are related by the equa- 
tion 


r+ 
(4.11) fi (2) = wf exp (icy) df” (y). 


Let fe = afi. Clearly || fi ||) S| w| <1, ||fil| S| we| <1 and || fi-—f\| s 
|wl{l1—z|. Then the operator PF, + (I — P)F, = F; + P(F, — Fs) 
has norm not exceeding | w || z| + /|w||1—2z| <1. 

Recalling that t(z) = e(x) + 2(1 — e(z)) we can write (3.3) in the form 
g = ce + P(fg) + 2(1 — P)(fg), as claimed. Hence its solution is given by 
(4.2), and the projection Pg is given by (4.6). The solution is unsatisfactory in a 
certain sense, because its corresponding distribution function g(z) is only deter- 
mined after performing a rather large number of convolutions: g has the form 
exp h = e +h + h’/2! + --- , and each term h” stands for an n-fold convolu- 
tion. However, we only need the transforms (4.3) and (4.7), and these will be 
evaluated explicitly in the next section. 


5. Proofs and remarks. We begin by deriving the equation (2.la). The (ran- 
dom) function N,(z) is constant except at the points z = x = R,, ; clearly 
N,(a — 0) = N,(ar) = k + 1, while N,(x% +0) = k. Then the quantity 
z** jumps by an amount z'(1 — z) as z moves from left to right across 2; . 
Hence for the Fourier-Stieltjes transform we have 


(5.1) [ exp (iar) daz" = (1 — 2) D exp (ipns)#', 


a formula which persists even if several z, happen to coincide. 





1040 J. G. WENDEL 


Multiplying (5.1) by exp (ioS,) and interchanging expectation and trans- 
form signs, as is clearly legitimate, we obtain 


[ exp (ipr) dh,(z) = [ exp (ipr) dg,(x) = (def.) ¥(g.) 
= (1 — z) > ef (Rix & S,)2*; 
k=O 
it is of course well known that ¥ is a continuous homomorphism on the algebra 
of bounded measures. 
Then by (4.3) we have for the left side of (2.1a) 
Y w" DS of (Ra & 8.) = (1 — 2)"W(g) 
n=O k—0 
(5.2) = (1 — 2)" exp 2) neff) + (1 — 2*)¥(Pft)) 
= exp 2) n'{(we)"o(o)* + 2°W(ft) + (1 — 2*)W(PHI, 


since c = (1 — wzy(c))*(1 — z). To evaluate ¥(Pf?) we have, by (4.10) and 
(4.11), 


v(Pft) = i exp (ioz*) aff(z) = w" [ exp (soz* + tox) df (2) 


= w" ef (Si & S,). 


Similarly, ¥(ff) = w" cf (S, & S,). Then the expression in braces can be 
written 


w” cf (Si & S,) + (wz)"[cf (0 & S,) + cf (S, & S,) — ef (SE & S,)] 
= w" ef (Sy & S,) + (wz)" ef (Se & S,), 


which, inserted on the right side of (5.2), yields the equation (2.1a). Since both 
sides converge for | w| < 1, |z| < 1, analytic continuation shows that (2.la) 
holds throughout that region, as well as in the smaller region of Section 4. 

We remark that an analogous formula can easily be found for the order statis- 
tics R%, of the first n partial sums omitting So. In fact, if X (with distribution 
function f) is independent of the pair R,.. & S,, then 


xX + Rais & X + Sa 
have the same joint distribution as RI, & S,. Hence 
ef (Riu & S.) = o(p + ¢) cf (Raia & S,4), 


and the generating function can be quickly written down. 
The relation (2.1b) follows from (4.7) by the same method applied to Pg. 
In order to prove (2.1c) we observe that for measures of the form Pg the mass 





ORDER STATISTICS OF PARTIAL SUMS 1041 


at the origin defines a continuous homomorphism y on PX, given by yo( Pg) = 
(Pg)(0 + 0) = g(0 + 0). With this definition of y it follows easily that 

vo( Pg.) = E(2"* exp (icS,)) — 2""y(o)" 
and therefore that 


(5.3) vo(Pg) = 2, w"E(2"* exp (ioS,)) — 2(1 — wap(o))™. 
Combining (5.3) with (4.7) we see that 
(5.4) > w"E(z"* exp ioS,) = exp > n"{(1 — 2")Wo( Pht) + w"z"e(o)"}. 


Clearly 
0+0 
wolPA) = $10 + 0) = w" [exp (iy) apy) 


= w"E({1 — e(S,)] exp ioS,). 
Then the expression in braces can be written as 
w"E(((1 — 2")(1 — e(Sa)) + 2] exp ioS,) = w"E(z™” exp ioS,). 
We substitute this on the right side of (5.4) and obtain 


(5.5) > w"E(z"* exp ioS,) = exp > n'w"E(2”” exp ioS,). 


Setting z = r exp ip and letting r— 1 — 0 yields (2.1c). 

We turn now to the proof of Theorem 2.2. Both follow from the simple 
observation that if a double power series >-5 w" >_$ ana2" can be written as a 
product (09 b,w")( Doo ca(wz)”) with doo = bo = co = 1 then ano = by, 
Gan = Ce aNd Gy4 = Gn+o0i4- Applying this remark to (2.la) we obtain 
(2.2a) immediately. The relation (2.2b) follows from (5.5) by noting that 
E(z"*Y) = Ds “E(Y; N. =k) and 


E(z"®’Y) = E(Y; 8. <0) + 2"E(Y; 8, > 0), 
where Y = exp ioS,. As by-products of the argument we have the relations 


Sw" of (fy & 8.) = exp Ss nw" of (St & 8,) 


(5.6) 


2 w"E(exp ioS,;N, = n) = exp >, n‘w"E(exp ioS,; S, > 0) 
i 


and analogues with R, replaced by &., Si by Sz, N. = n by N, = 0 and 
S, > 0 by S, s 0. The first of these is of course Spitzer’s formula, since 


R, = Rj. 





1042 J. G. WENDEL 


In concluding this section we point out that various combinatorial identities 
can be derived from (2.1), (2.2) and (5.6), by a method sketched in [8]. To 
minimize the amount of new notation required we illustrate the idea on just 
one case. 

Let a , a2, -*~ , a, be arbitrary real numbers, and let # be a permutation of 
1, 2,---, n. Applying x to the a’s we obtain a,q) , Gea ,°** , Gem). We set 
S.(a) equal to the sum of the first k of these, and So(x) equal to zero. These 
sums arranged in descending order of magnitude are designated by R, (7). 

Fix k, and let D stand for an arbitrary n — k-element subset of 1, 2, --- , n; 
D’ is its complement. Let rp , rp’ be arbitrary permutations of the sets D, D’. 
Let R,-a(4n) and Bi(wp-) be defined as the greatest and least of the obvious 
partial sums. Then we have the following combinatorial identity, presumably 
already known to Bohnenblust, Spitzer and Welch. 

Tueorem 5.1. There is a 1 — 1 correspondence x «+ (D, xn, rn’) carrying the 
numbers R,,4(4) into the sums R,4(4>) + Rilo’). 

Proor: Let p,, p2,°** , Pa be nonnegative numbers with sum one; let the 
common distribution of independent random variables X, be specified by 
PriX = a,j} = p;. We form the partial sums S, , the order statistics R,. , 
and apply the identity (2.2a) with « = 0. In the resulting formula we equate 
the coefficients of the product pip: --- p, , and obtain 

LeexpipRaa(r) = Do exp iol Raa(xo) + Ba(ro’)}, 


‘* DD 


which proves the result. 


6. Continuous case. Let z, be a centered separable process with stationary 
independent increments, starting at the origin; E(exp iex,) = exp tw(a). For 
a real number z let L,(z) be the occupation time of the half-line [z, ~), i.e. 
the measure of the set of r, 0 S r+ S i, for which z, 2 z, and let 


L, = L.(0 + 0), 


the length of time during which z, is positive. 

Let X, be independent, each having the distribution of zs, A > 0. Then 
the partial sum S, has the distribution of z,4 , and it is natural to expect that 
if A —0,n— ©,nAd—t > 0 then the behavior of the first n partial sums will 
closely approximate the behavior of the process on the interval [0, ¢]. This is of 
course a well-known idea that has been exploited by many authors; the treat- 
ment here is close in spirit to that of Baxter and Donsker [3], who studied 7, = 
SUPosr<:t, . We have the following result, which in principle yields the joint 
distribution of L, & 2; : 

TuHeoremM 6.1: The Laplace transform of cf (L; & x,) is given by the relation 


(6.1) 8 [ e cf (L, & z,) dt = exp [ t'e"' lef (te(x,) & a.) — 1) dt. 
: 0 


Proor: In the identity (2.1¢c) let w = e~"*, replace p by pA, and multiply 





ORDER STATISTICS OF PARTIAL SUMS 
both sides by 1 — e**. There results 


(1 — e**) > e'"*E (exp i[pAN, + oS,]) 
0 
(6.2) 


= exp > ne *"(E (exp ilpnde(S,) + oS,]) — 1}. 


It is not hard to show that as A — 0, nA — t, AN, approaches L, in the mean of 
order one; hence the joint distribution of AN, & S, tends to that of L, & z,. 
Then the left side of (6.2) approaches that of (6.1). The integral on the right 
side of (6.1) clearly exists, because the quantity in square brackets can be 
written 


[exp (ipt) — 1|E(exp ioz,; 2, > 0) + exp fa(c) — 1 


which is O(t) as t—» 0. Hence the right side of (6.2) approaches that of (6.1) 
and the proof is complete. 

We remark that in a special case formulas of the “arcsine law’ variety are 
recovered. In fact, if we put ¢ = 0 in (6.1) and write Pr{z, > 0} = p, we ob- 
tain 


(6.3) “| e* of (L,) dt = exp [ te" (exp ipt — 1)p, dt. 


For some interesting processes p, is constant, say p; = p, 0 < p < 1. Writing 
q = 1 — p (6.3) then reduces to 


(64) [ o* of (L,) dt = (8 — ip)%e*. 

0 
It follows at once from (6.4) that L = €"L, has probability density f(L) given 
by 

f(L) = x" (sin rp)L“(1 — Ly, 0<L<1; 


this density appears in Spitzer [7] and Sparre Anderser ,6}, in connection with 
the limiting distribution of nN, . 

One can also obtain a formula for the general case L,(z), or rather, for a 
transform on the variable z. The formula is 


8 [ edt [ exp (ipr) d, E (exp [—AL,(x) + toz,)}) 
(6.5) ' 


( oo 
= d(s +d)" exp 4 [ te (ef (ct & x) —1) + 6 (ef (27 & x, — a}. 
0 


We are led to this by writing the left side of (2.1a) in the form 


x 


a=s*s uw" [ exp (ier) de E(2"* exp ioS.), 


0 





1044 J. G. WENDEL 


then raultiplying both sides by (1 — w)(1 — z) = [(1 — z)(1 — wz)” ] 
(1 — w)(1 — wz), setting w = ¢ “*, z = e&™, and letting A > 0. As before, 
the left side converges, to the left side of (6.5), and therefore the right side 
converges too. Its limit is the expression on the right of (6.5); the integral exists 


because the bracketed quantity can be written 
(1 — &™) ef (af & a.) + fe" + e* — 1] - 1, 


which is O(t) as t-+0. 
Letting \ — © in (6.5) we obtain formally 


8 [ e" of (#7, & x,) dt = exp [ fe (ef (zt & x) — 1) dt, 


which in effect extends the result of Baxter and Donsker to the joint distribu- 
tion. 


REFERENCES 


{1] Guan Baxter, ‘‘An operator identity,’’ Pacific J. Math., Vol. 8 (1958), pp. 649-663. 

[2] Guen Baxter, ‘‘A two dimensional operator identity with application to the change 
of sign in sums of random variables,”’ Trans. Amer. Math. Soc., Vol. 96 (1960), 
pp. 210-221. 

[3] Guen Baxter anv M. D. Donsxer, ‘‘On the distribution of the supremum functional 
for processes with stationary independent increments,’’ Trans. Amer. Math. 
Soc., Vol. 85 (1957), pp. 73-87. 

[4] Feurx Po.iuaczex, ‘‘Fonctions caracteristiques de certaines repartitions définies au 
moyen de la notion d’ordre, application a la théorie des attentes,’’ C. R. Acad. 
Sci. Paris, Vol. 234 (1952), pp. 2334-2336. 

[5] Exntc Sparre ANDERSEN, “On sums of symmetrically dependent random variables,”’ 
Skand. Aktuarietidskr., Vol. 36 (1953), pp. 123-138. 

[6] Extc Sparre ANDERSEN, ‘‘On the fluctuations of sums of independent random variables 
II,”” Math. Scand., Vol. 2 (1954), pp. 195-223. 

(7) Franx Sprrzer, ‘‘A combinatorial lemma and its application to probability theory,”’ 
Trans. Amer. Math. Soc., Vol. 82 (1956), pp. 323-339. 

[8] J. G. Wenvkt, “Spitzer’s formula: a short proof,’’ Proc. Amer. Math. Soc., Vol. 9 (1958), 
pp. 905-908. 

[9] J. G. Wenvet, “Order statistics of partial sums I, II,’’ abstracts, Notices Amer. Math. 
Soc., Vol. 5 (1958), p. 693, Vol. 6 (1959), p. 145. 





PROBABILITY DISTRIBUTIONS RELATED TO RANDOM MAPPINGS 


By Bernarp Harris 
University of Nebraska 

1. Introduction and Summary. A Random Mapping Space (X, 5, P) is a triplet, 
where X is a finite set of elements z of cardinality n, 3 is a set of transformations 
T of X into X, and P is a probability measure over 3. 

In this paper, four choices of 3 are considered 

(1) 3 is the set of all transformations of X into X. 
(II) 3 is the set of all transformations of X into X such that for each z e X 
Tz # x. 

(III) 3 is the set of one-to-one mappings of X onto X. 

(IV) 3 is the set of one-to-one mappings of X onto X, such that for each 
reX, Tz # z. 

In each case P is taken as the uniform probability distribution over 5. 

If z e X and T « 3, we will define T’z as the kth iteration of T on z, where k 
is an integer, ie. T*x = T(T*'z), and T*z = z for all z. The reader should 
note that, in general, T*z, k < 0, may not exist or may not be uniquely de- 
termined. 

If for some k 2 0, T*x = y, then y is said to be a kth image of z in T. The 
set of successors of z in T, Sr(z) is the set of all images of z in T,, i.e., 


Sr(z) 7 {z, Tz, Tz,  # rd. 


which need not be all distinct elements. 
If for some k < 0, T*z = y, y is said to be a kth inverse of z in 7. The set of 
all kth inverses of z in T is T(x) and 


0 
Pr(z) = U T(z) 


is the set of predecessors of z in T’. 

If there exists an m > 0, such that T”z = z, then z is a cyclical element of 
T and the set of elements z, Tz, T*z, --- , T” ‘zis the cycle containing z, Cr(z). 
If m is the smallest positive integer for which T"z = z, then Cr(z) has cardi- 
nality m. 

We note further an interesting equivalence relation induced by T. If there 
exists a pair of integers k, , k, such that 


Tz = T'y, 
then z — y under T. 
It is readily seen that this is in fact an equivalence, and hence decomposes X 
Received August 20, 1957; revised September 7, 1959. 
1045 





1046 BERNARD HARRIS 


into equivalence classes, which we shall call the components of X in 7; and 
designate by K,(x) the component containing z. 

We define s7(z) to be the number of elements in S7(z), pr(z) to be the num- 
ber of elements in P;(xz), and l;(z) to be the number of elements in the cycle 
contained in Kr(z) (i.e. l(z) = the number of elements in Cr(z) if z is cyclical). 
We designate by gr the number of elements of X cyclical in T, and by rr the 
number of components of X in T. 

Rubin and Sitgreaves [9] in a Stanford Technical Report have obtained the 
distributions of s, p, l, g, and have given a generating function for the distribution 
of r in case I. Folkert [3], in an unpublished doctoral dissertation has obtained 
the distribution of r in cases I and II. The distribution of r in case III is classical 
and may be found in Feller (2], Gontcharoff [4], and Riordan [8]. In the present 
paper, a number of these earlier results are rederived and extended. Specifically, 
for cases I and II, we compute the probability distributions of s, p, l, g and r. In 
cases III and IV the distributions of | and r are given. In addition some asymp- 
totic distributions and low order moments are obtained. 

For the convenience of the reader, an index of notations having a fixed meaning 
is provided in the appendix to the paper. 


2. Representation of 7 as a directed graph. It will be convenient to represent 
elements of 3 as directed graphs. For example, ifn = 10, X = {1,2,3,4, --+ , 10}, 
and 


T(1)=4, T7(2)=5 1(3)=9, T(4)=8, T1(5) =5, 


T(6) = 8, T(7) = 9, T(8) = 1, T(9) 4, T(10) = 8, 
Then T has the representation below: 


3. Probability Distribution for Case I. In case I, P(T) = 1/n” for all Te3 . We 
now turn to the computation of the probability distributions of s and 1, the 
number of elements in S;(z) and the number of elements in the cycle contained 
in K(x) respectively. 

Then, for any choice of z, we have: 


P(s =k,l =j) = P{T'x #2, Tz, ---,T '2(0 <rgk —1);T2 = T's] 





RANDOM MAPPINGS 


Hence 


‘ ‘ oe (n — 1)! 

(3.1) P(s = k,l = j) (n — Bint’ lsjskasn, 
and summing over j, we have 

(n — 1)!k 


(3: => Sl 
3.2) P(e =k) = ae 


(3.3) P(l=j) = > pe 


From consideration of symmetry, we note trivially that 
(3.4) E(l) = E{(s + 1)/2}. 


We now obtain the asymptotic probability densities of s and 1. In (3.1) let 


= Wnz,j = Vny, and replace factorials by Stirling’s approximation. Then 
we have 


n*-V¥ nz} ev™ 


GV 


nr-v er ev™ 


P(s = Vnz,l = Vny) ~ 


n—V nerd i a Jatt 
. (1 ~a) 


Write 
(1 — (2/Vn)*¥"™** = exp[(n —Vnz + 4) log (1 — 2/Vn)), 
and expand log (1 — z/+/n) in a power series, obtaining 
P(s = Vnz,l = Vny) one ™. 

Thus, the asymptotic density of (s/+/n, l/-+/n) is 
(3.6) f(z, y) = €™, O<ysr<., 

The marginal distributions f,(z), f(y) give the asymptotic densities of 
s/n, \/+/n respectively and are easily obtained by integration. 
(3.7) f(z) = ze, 0<z, 
(3.8) fly) = V2x (1 — &y)), 0<y, 


where $(y) = [4% (Qe) te dz. 
In numerical computations, the cumulative distribution function F;(y) is 
probably more useful than the density function f2(y) and is therefore given below: 


(3.9) Fy) = P(Y sy) =1— 6%" + yV2e(1 — O(y)). 





1046 BERNARD HARRIS 


into equivalence classes, which we shall call the components of X in 7; and 
designate by K,(x) the component containing z. 

We define s7(z) to be the number of elements in S7(z), pr(xz) to be the num- 
ber of elements in P;(z), and l,(z) to be the number of elements in the cycle 
contained in K,(z) (i.e. (xz) = the number of elements in C7(z) if z is cyclical). 
We designate by gr the number of elements of X cyclical in T, and by rr the 
number of components of X in T. 

Rubin and Sitgreaves [9] in a Stanford Technical Report have obtained the 
distributions of s, p, l, g, and have given a generating function for the distribution 
of r in case I. Folkert [3], in an unpublished doctoral dissertation has obtained 
the distribution of r in cases I and II. The distribution of r in case III is classical 
and may be found in Feller [2], Gontcharoff [4], and Riordan [8]. In the present 
paper, a number of these earlier results are rederived and extended. Specifically, 
for cases I and I], we compute the probability distributions of s, p, l, g and r. In 
cases III and IV the distributions of / and r are given. In addition some asymp- 
totic distributions and low order moments are obtained. 

For the convenience of the reader, an index of notations having a fixed meaning 
is provided in the appendix to the paper. 


2. Representation of 7 as a directed graph. It will be convenient to represent 
elements of 3 as directed graphs. For example, ifn = 10, X = {1,2,3,4, --- , 10}, 
and 


T(1)=4, 1(2)=5, 1(3) T(4) =8, T(5) =5, 


T(6) = 8, T(7) = 9, T(8) T(9) = 4, T(10) = 8, 
Then T has the representation below: 


3. Probability Distribution for Case I. In case I, P(T) = 1/n" for all Te 3 . We 
now turn to the computation of the probability distributions of s and /, the 
number of elements in S;(z) and the number of elements in the cycle contained 
in K,y(x) respectively. 

Then, for any choice of z, we have: 


P(s =k,l =j) = P{T'x ¥2z, Tz, -:-,T '2(0 <r sk —1);T2r = Tr} 





RANDOM MAPPINGS 


Hence 


3 & ae od (n — 1)! ; 
(3.1) P(s = k,l = j) m= Bint’ lsjsksn, 


and summing over j, we have 


‘ _ 1, _ (n— I)Ik 
(3.2) P(s =k) = @ a Bint’ 


_ 1 (n- I)! 
(3.3) Pl=/) = 2 ae 


From consideration of symmetry, we note trivially that 
(3.4) E(l) = Ef(s + 1)/2). 


We now obtain the asymptotic probability densities of s and 1. In (3.1) let 

k = Snz,j = V/ny, and replace factorials by Stirling’s approximation. Then 
we have 

P(s = Vnz,l = Vny) ~ 


nr ver ev™ 
(n tot a/nz)*¥ 4 


nvr ev ae 


Write 
(1 — (2/Yn)"¥*"* = exp [(n —Vnr + 4) log (1 — 2/Vn)], 
and expand log (1 — 2z/+/n) in a power series, obtaining 
P(s = Vnt,l = Vny) - ne, 
Thus, the asymptotic density of (s/+/n, l/-+/n) is 
(3.6) f(z, y) = &™, O<ysr<~., 


The marginal distributions fi(z), f:(y) give the asymptotic densities of 
s/~/n, \/+/n respectively and are easily obtained by integration. 


(3.7) fi(z) = ze ™, 0 <z, 
(3.8) Sly) = V2e (1 — @y)), O<y 
where #(y) = f %. (2x) tg de. 


In numerical computations, the cumulative distribution function F,(y) is 
probably more useful than the density function f2(y) and is therefore given below: 


(3.9) Fy) = P(Y sy) =1-— 6"? + yV2e(1 — Oy)). 





1048 
We note further that 
1 at 


Hence 
(3.11) E(l) ~}(2en)', oo} — n[(2/3) — (2%/16)). 


Formulas (3.1), (3.2), (3.3), and (3.7) have been obtained by Rubin and 
Sitgreaves [9]. 
Rubin and Sitgreaves have also shown that 


= 14 
(3.12) Pia= i= Pa, jot%---,0 
We now prove this using a partition argument due to Katz [7]. 

Consider the directed graph representation of 7 and partition X as follows. 
Let M,(T) be those elements of X cyclical under 7. Define M;(T') to be those 
elements of X whose images are cyclical under 7’, but are not themselves cyclical. 
Let M,(T) be those elements of X whose images under T are in M,(T'). Con- 
tinuing in this manner until X is exhausted, the n — j non-cyclical elements 
of X are partitioned into m(T') sets each non-empty for 7 # n. Designate the 
cardinality of M;(T) by nj(T),7 = 1, 2, --- , m(T). 

The number of decompositions of X for mn, m2,-°+ , mm» fixed is 


n! “~ o n Rm 
(3.13) Tnleal ale? = 


where "7.1 n; = n — j. Hence 


“ —n n! "in am . 
(3.14) P(q=j)=n ecard ‘nt® +++ man, j #1, 


where the sum is taken over all non-empty m-part partitions of n — j. 
Katz [7] has shown that 


ni iene << ilies on tu 
(3.15) LF ™! Na! a Tal”? ny Nm—1 G-— 1) ifn — jp’ 
from which we obtain (3.12) for 7 # n. We have j = n, if and only if T is one- 
to-one and onto; hence 
(3.16) P(q =n) = n!/n", 
which coincides with (3.12) for 7 = n. 
It is curious to note that this is exactly the same as the distribution of s 


given in (3.2), and hence has the same asymptotic distribution and asymptotic 
moments. 


The distribution of p has been obtained by Rubin and Sitgreaves [9], 
(n — 1)!j7*(n —j)*” 





RANDOM MAPPINGS 1049 


We establish this as follows: 

Let X;1 be j — 1 specified elements of X say 2, 2,--- , tj)... Let z be 
a distinguished element of X not in X,.,. Then define 3, as those transforma- 
tions T in 3 such that T(X — (X;.U2z)) = X — (X;, Uz). Define 3, as 
those transformations in 3 such that T(X;.) = Xj. Uz, and T*z, = z for 
some k > 0 andi = 1, 2, --- ,j7 — 1. We further define 3* = 3,1 3,. Then 


(3.18) ("— 1) P(rest) = Pip = 3), 
and 

(3.19) P(T ¢€3*) = P(T ¢€3,)P(T € 3h). 
We readily see that 

(3.20) P(T €3,) = [(n — j)/n)"”. 


Hence we have only to compute P(T ¢ 3,). For any T ¢ 3, , we can, by restrict- 
ing attention to X ;_; define an associated transformation T}_, which has T7_42; = 
Tz;,i = 1,2,---,j — 1, and Tfyz = z. Let Nj. be the member of distinct 
transformations which can be constructed in this manner from T ¢ 3, . Since, 
in 3, 2 has n equally likel, images under 7, we have 


(3.21) P(T ¢€%:) = Nj4/n*" 


and N,_, is readily obtained by Katz’s Lemma and the partition argument used 
in (3.13). Hence 


1 ! a sd 
aiid Nea 5 Dipeprn ht me j>1, 


and, trivially, Ny = 1. 
In (3.22), the sum is over all non-empty m-part partitions of j — 1, and the 
factor (1/7) is obtained by distinguishing the element z. Hence 


no fert jr? (n-jn™ 
Pte =) = (5 1)aa(*3) 


and (3.17) is established. 
We now note an interesting relationship, 


(3.23) E(S) = E(p).’ 


This is established at once by symmetry. For any T ¢ 3 such that y is a suc- 
cessor of z, there is a corresponding T ¢ 3 with z a predecessor of y; the cor- 
respondence is accomplished by interchanging z and y in the directed graphs. 

We may also note an interesting physical property of directed graphs of this 
type, which holds for every T ¢ 3. For any T ¢ 3, let r; be the number of ele- 


! This was pointed out by D. Blackwell in a private conversation with the author. 





1050 BERNARD HARRIS 


ments z for which T~'x has j elements. Then, 
(3.24) Dini = 
j= 


Also, 


(3.25) p> r; 


Thus, 


(3.26) m= Do iris 


? 
From this it follows at once that 
(3.27) E(p”) = 1, 


where p” is the number of elements in T™'(). 
The distribution of p™ is readily seen to be 


wm _ a» — (n\fl\(n-1V" 
(3.28) P{p =j} = (")(t)( > ) y 


We proceed now to the question of the probability distribution of r, the num- 
ber of components of X in T. Folkert [3] has obtained the distribution and 
has shown 


_» 18 n! hi pbs " 
Om) fA SR we 


where S, are Stirling’s Numbers of the First Kind, and the sum over 
ky , ke, «++ , k, is over all choices of k, , kz, --- ,k, with k; > O(¢ = 1,2, --- ,w) 
and }-t_,k; = n. 

In this paper we obtain a probability generating function for the number of 
components, which has a good deal of intrinsic interest because of its relation 
to Faa de Bruno’s formula (Jordan [6]) and the exponential polynomials of 
Bell {1}. 

Let k; denote the number of components with exactly « elements. Then every 
T ¢ 3 determines an n-tuple (k,; , ke, --- , k,). Hence, for every specification of 


(ki, ke, ---, k,n) we have a set of transformations 3, x,.....«, in 5. 
Then 


ni Ti I} --- 7s 


(3.30) PUP 8 Dos tase>+ ta) = if 2... nl klkel--- kl n” 


’ 


where I ;/j’ (j = 1, 2, +--+, n) is the probability that a transformation T; on j 
elements X ; is indecomposable, i.e. Kr,(z) = X;for all ze X;, whereO < k; < n 





RANDOM MAPPINGS 
and >-7, ik; = n. We have 
(j — 1)!tdi — 1)! 
B= 3 Pg = i, Kr,(z) = X) y Ge 
Hence 
j-l1 id lj 
(3.31) (j= DIF 
; 4! 
This result has been obtained earlier by both Katz [7] and Rubin and Sitgreaves 
[9]. Then, the generating function of hk, , ke, +--+ , k, is given by 
a 1(1y4y)"*(Ipae)"* «++ (In tn)" 
3.32 G ’ 9 aes ': , 
a Mer Me *-) kikg sty LP 'QP® «++ mPtkyl kel «++ kyl n™ 
since the coefficient of x;"'xy"* --- 2," = P(T &€ Siy.ty,---.e,) for Dent ik; = n 
Since r = 2k ks, 


nir!(1y21)""(1g22)"* Pe. (I,2_)** 


G(2;, 2, +*+,2s) = eee tt eee 
ky ,kq,+ ++ ken nr! ky! ke! “os k,! Ir'ar* --- mf 


and 


(3.33) Gay, 22, +++, 20) = ae n! fe gp fem fate) 


rin” 2! 


We can extend the definition to G(x, z:,---) with no loss of generality, 
since this will in no way affect the coefficient of z,"'x:"* --- x,"*. Hence 


(3.34) G(a,%,-*:) = ™ exp M2 a, 


If in (3.34), we replace z; by zx‘, the coefficient of z” in G(z, 2’, ---) is 1 for 
all n. Thus, we have 


(3.35) 


and 


(3.36) 2 I,2‘/il = log 2 (i'/it)z*. 


Replacing z; by t‘z; in (3.34) we obtain: 


' ae ; ‘ 
(3.37) Olte,, fre, +--+) wo hep > et. 
n” jna_—tséat! 
In (3.37) the coefficient of t" gives the probability of any possible decomposition 
of X into components with the exponents of z,; indexing the decomposition. 
Finally we observe that, replacing z,; by tz’, we get 





1052 BERNARD HARRIS 


(3.38) Gtr, ta’, ---) = Mexped 4, 
t=—1 , 
or equivalently 


o « ime 
(3.39) G(tr, tz',---) = ™ bP =|, 
nN” ind 2 
and the coefficient of t‘z” in G(tz, tz’, ---) = P{r = kj. 
We now employ the generating function given above to obtain Folkert’s 
formula (3.29). From (3.38), we have 


e n! = I,2° P 
(3.40) coefficient of ¢* hi (= ue) ‘ 
and from (3.36) we have 


(3.41) coefficient of * = Pa [toe ( o >} 52 “VT, 


Since 


log (1 + w)* = Este, 


see, for example, Jordan [6], p. 146. oe this in (3.41), we get 


(3.42) coefficient of t* = = L sk ai s (Fs 2‘T, 
n"k\ poe pl il 


and, expansion by the multinomial theorem gives 


coefficient of & = 2! » os = Ss 
nk! por pw! 


“ . “(tay (Z2)" (Le) 
me kyl ke! eee k,! li” ai" ai” q 


Zkimw 
1 


To find the coefficient of xz” in (3.43) it suffices to restrict the second sum to 
non-negative n-tuples (k,, k:, --- , ka) with oP k; = w, Doi ik, = n; hence 


OA. Fe 58 ~ a mi wishes Ser ~~ al (7;) i) = (z)” 


which coincides with (3.29), except that partitions of n are enumerated without 
regard to order in (3.44), and thus we have obtained an alternate form of 
Folkert’s formula. Rubin and Sitgreaves [9] noted that n“E(s) = E(s"'). We 
remark, further, that it is even more curious that 


(3.45) n“E(s) = P(r = 1) = P(l = 1) = n“E(p) = n“E(q). 





RANDOM MAPPINGS 1053 


4. Probability Distribution for Case II. In case II, P(T) = (n — 1)™ for all 
T ¢ 3. As in case I, we first consider the probability distribution of s and 1. 
Computing exactly as in Section 3, we obtain 


; (n — 2)! é 
(4.1) P(s= l= 3) = C—hFig = bl’ 2ajakan, 


and 
(n — 2)(k — 1) 
(n — 1)*"(n — k)!’ 


: - (n — 2)! 
an POD = BGR ey 


Comparing these results with (3.1), (3.2) and (3.3), we have 
P(s =k\I,n) = P(s =k+1|II,n+1) 
P(s=k,l=j\I,n) = P(s=k+1,l=j+1|II,n+1), 


(4.2) P(s =k) = 


(4.4) 


and 


P(L=j|I,n) = P(l=j+1|I1,n+1). 


Hence 

(4.5) E(l| II,n+1) = E(l|I,n) +1 

and 

(4.6) E(s|II,n +1) = E(s|I,n) +1. 

From (3.4) we have 

(4.7) 4E(s|II,n) +1 = E(l\ II, n). 

Then, by analogy with (3.6), we note that the asymptotic density of 


(s//n — 1, 1/-/n — 1) is 
(4.8) f(z,y) = e™, Osys2r<@, 


giving the same marginal density functions as (3.7) and (3.8). 
Now consider the probability distribution of the number of elements of X 
cyclical under T. We show that 


(49) P(q = 3) = nD, ("— 1) / in 0)", 


where D; is the jth derangement number, i.e., D; is the nearest integer to j!/e, 
i ca 0, and Dy = 1. 

The proof is identical with the proof of (3.12) except that the j! in the nu- 
merator of (3.13) is replaced by D;. Hence an application of Katz’s lemma 





1054 BERNARD HARRIS 


gives 
‘ ] n! Mutt cies . 
(4.10) P(q = 9) = 7—7, 2 amma er iP init ss nas, GM, 
the sum being taken over all non-empty m-part partitions of n — j. Hence 
, 1 nin™ >" ; 
4.11 P(q = j) = ——~ Dj ->————————| , 
ae) (= 3) = Gop Gia! — 


and thus P(q = j) is given by (4.9). The case 7 = n, is given trivially by (4.9). 
The asymptotic distribution is obtained by replacing D; by j!/e, and re- 
placing factorials by Stirling’s approximation. Then, letting 7 = +/ny, we get 


(4.12) fy) =y™, 0<y<», 


for the asymptotic density of gn™*. The agreement of (4.12) with (3.7) can 
hardly be surprising in view of the agreement of (3.2) and (3.12). 
We now obtain the distribution of p, 


G— iim —j)im — 1)’ 
P(p =n—1) =0, 


P(p = q) j=1,2,---,n — 2, 


P ¥ P( n! n n” *? 
= l _ — } = paceman y mm, +t 
oon q°‘? ) (n — 1)" 53 (n — 9)! 
This is established as follows. Define X,;_, , 3, , % , and 3*, as in case I. Let x 
be a distinguished element of X. Then, as before, 


(4.14) P(p = j) = (" P ') P(T €5*), 


and 

(4.15) P(T e 3) = P(T ¢€3,)P(T € dh). 

Then 

(4.16) P(T e€3:) = [((n — j — 1)/(m — 1)]”"”. 
Exactly as in (3.22), we can employ Katz’s Lemma to obtain 

(4.17) P(T &€ 2) = j?*/(n — 1)7". 

Combining these we have 


~~) -(@-Wi(e-i-ly_* 
ro-= (2) CHE) ee 





RANDOM MAPPINGS 1055 


The condition Tx # z for all z ¢ X, precludes the possibility of p = n — 1. 
There remains the case p = n. 


P(p =n\z) = ~ P(q = j, Ke(z) = X, Cr(z) # 0) 
(4.18) oa Fs 
2 n” *"D;n! sMa~ ig 
jaa (n — 1)%(7 — 1) (mn — 9)! D; n 
Inasmuch as (3.23) depends only on invariance under the symmetric group 
operating on X and (3.26) is a property of the directed graphs in general, both 
of these apply in II. 
The distribution of p” is obtained trivially, 


o. ~~. (n-1 1 \(n-2V" 
(4.19) P(p -i=( j )(=7q) (244) 


The distribution of r, the number of components, has been computed by 
Folkert [3], and shown to be 


P(r 
(4.20) 


(ky — 1)"*(ky — 1)"* +++ (ky — 1)”, 


where the sum over k, , k:,--- ,k,is over all u-tuples with k; > land > kp =n. 
We will now develop a probability generating function for the number of com- 
ponents, and obtain an alternate derivation of (4.20). The argument parallels 
the same discussion in Case I and hence will only be sketched briefly. 
As in case I, 
niIzl;’ --- It 
(421) PUP C Iestere-ste) © Siega ... wig linls oe hallwe 1)” 


where J ;/(j — 1)’ is the probability that a transformation 7; on j elements X,; 
is indecomposable, i.e., Kr,(z) = X;forallze X;,0 Ski S nand Dof4 ik, = n. 
I; j j jj! 
—_ = ( = . = = ——— 
= 1h 2 Pq = 4, Kr,(2) = X;) LG =1yG —- 1)! 
-F G- Ds 


3 (j — 1)!" 


(4.22) u 


(4.22) has previously been established by Katz [7] using a somewhat different 
argument. Then 


(lex )“*( 1,25)" ake: (1,2,)"* 
4.23 G(x, 2, +++, tn) = aca ee eee 
428) Gay, 2, ***) Se) > de, Sg... ahh) hele 21)" 


is the generating function of k,, k,;,---, k, in the same manner as (3.32). 





1056 BERNARD HARRIS 
Since r = dis k,;, we obtain, after extending the definition to G(z2, 2, ---), 


(4.24) G(22, a, )= 
(n — 


If, in (4.24), we replace z; by z’, we obtain 
(4.25) 


Thus 


(4.26) Cae x 


Replacing x; by t'x; in (4.24), we obtain: 
Se on I; oa 
(4.27) G(t22,t xs, )= fp —i1>™ p > 


Then the coefficient of t” in (4.27) gives the probability of every possible de- 
composition of X into components, in the same manner as (3.37). If we replace 
x; by tz’, we get 


2,3 ! I; 
(4.28) Q(t ta’, --) = exp | 1 a] 


im2 4 


or 
(4.29) G( ta’, tx’, 
giving 
coefficient of tz” in G(tz’, tz’, ---) = Pir = kj. 


We now employ the generating function to obtain an alternate form of Folkert’s 
formula (4.20). From (4.28) we have 


jaa! 


‘ ni = 1,2°F 
(4.30) coefficient of ¢* “2. 1k! be he ; 


and from (4.26) we have 


(4.31) coefficient of t* = in pe oe (1 + yf “ aaa Ws we) y. 


Hence 


coeflicient of “ = nt ye kl Ss: be ea ae 


(n — 1)"k! ot t= 





RANDOM MAPPINGS 
and, as in (3.43), we get 
' = ! 
coefficient of “ = 4 > “i Ss. 


= x 

(n — 1)"k! po yp! * bidkg. kn 20 
Lkiww 
1 


u! [S - mT [2 - vey re is - vel 
kilke! +++ ka! B22 2! ! . 


To find the coefficient of x" in (4.32), it is sufficient to restrict the second sum 


to non-negative n-tupules (k; ,k:,--- ,k-) with SOR k, = wand }°2, ik, = n. 
Thus, we have 


P(r =k) = 


nt Ra 1 1*\** (2°\"" (n — 1)"\* 
(n — 1)" 2 St 2 ella vee ah) (5) is ( ni -) , 


with the second sum restricted to ky,ky,---,k, 2 O, ar ky = yw, and 
> 22 ik; = n by the deletion of zero terms, and thus we have an alternate form 
of Folkert’s formula. 


5. Probability Distributions for Cases III and IV. In case III, we have P(T) = 
n!"*, and in case IV, we have P(T) = D;’. In both cases, every z ¢ X is 
cyclic, since 3 is a collection of mappings which are one-to-one and onto in each 
case. As a consequence 
(5.1) Sr(z) = Pr(z) = Cr(z) = Ke(z). 


Therefore, many of the probability distributions considered for cases I and II 
coincide in cases III and TV. Hence, we consider only the distributions of / 
and r. Then, in case III, we have 


: n\ _;. nate a Bat po 
(5.2) P(l =j) = {(") (j — 1)!j(n a] / nn: 1/n. 


Gontcharoff [4] has shown that the probability that the number of components 
r of T is k is given by 


(4.33) 


(5.3) P(r =k) = coefficient of ¢ in Ait hon ten, 


and therefore is given by the well-known result, 
(5.4) P(r = k) = |S4\/n}, 


which may be found in Riordan [8]. Gontcharoff [4] has also shown that the 
distribution of (r — Er)/c, is asymptotically normally distributed with mean 0 
and variance 1. Feller (2] and Greenwood [5] have also computed Er and ¢? . 
We show an alternative computation using (5.2). 





1058 BERNARD HARRIS 


Let m; be the number of components of T with exactly j elements. Then 
(5.5) Br = EX m; = ¥ Em,. 
= i 
From (5.2) we note that Em; = 1/j, and hence 
(5.6) Er = > 5. — logn + y, 


j=l J 


where y is Euler’s constant. 
The variance has been shown to be 


(5.7) (3 4) tog + -7 


j=1 6 


9 


In case IV, we have 


P(l=j) = {(") G- 115 Das |/ tm D,) 
ia (n — 1)! D.-; 
(n — 3)! Dn ’ 
For large n; and j 2 2 and sufficiently small compared to n, 
(5.9) P(l = j) — 1/n, 


(5.8) 


Furthermore 
P(l =n) ~e/n 
P(t\=n-—1) =0 
P(lL=n— 2) ~e/2n 
P(l =n — 3) ~ e/3n. 


To get the probability distribution of the number of components, we employ 
the same type of generating function used earlier. First we note that 


(5.10) P(Kr(z) = X) = (n— 1)!/D, 


since all (n — 1)! n-cycles belong to 3. From this, we obtain: 


ni i!*221** .-- (n — 1)! 


PCT € Say seg.--sta) = D.2!™ 31" nl ™ ki kel hil TL hal Fal = Feel 


(5.11) 
n! 


~ Da D* a --- m™ kel --- Kel! 





RANDOM MAPPINGS 


where 0 < kj S nand >-2. ik; = n. Then 


' ke k 
can ERON 
(5.12) eee 

n! x 

“25 D. (3 += Petre 

where r = > -7.4k,. Proceeding as before, we have 


(5.13) G(2.,%:,°°:) = * exp > = 


t—2 
Replacing z; by zx’, we have 
- zi = D; x’. 
a0 nok Bete 
If we replace x; by tr’ in (5.13), we obtain 

n! 2 : t 
(5.15) G( tz’, tr’, ---) = D. ~ expt 2 - -7(% >, “| : 
ime dD, ju J: 


Since 


‘ 


yi — log (1 — z) —2, 


t—2 


G( tz’, tr’, ++ -) = ni exp —t (log(l—z) +z] 


(5.16) mark 
wi anes 
D, (1 — 2x)" 
Then 
(5.17) coefficient of tx” in G(ta’, tz’, ---) = P(r = k). 
From (5.15) we have 
’ ~~ ive 
itis ‘ n! z 
(5.18) coefficient of = Di (= =) ; 


and, expanding by the multinomial theorem, we get 
coefficient of ¢ 


awn: Zita (=) '(E 
: kgdky.-= "send Ke! kg! ~~ kal \2 ; 


n 
Lkyamk 
1 





1060 BERNARD HARRIS 


and thus 
(5.20) P(r = hk) = SS hal hal + eg P88. nh, 
Dau tales **he 


the sum over all non negative n — 1 tuples ky, ks, --- , k,n with }-2,k; = k 
and 5-7, ik; = n. Another form of the same result is obtained from (5.16). 
Here we have 


2 fang pPrt( 4 hes | 
(5.21) coefficient of x” = n!/D, 7 ete. i ae eel SI~)) 
j-0 (n —j)! 7 


Hence, we find that the coefficient of t‘x” is 


n ( <= 9)**4 | ow | 


(5.22) P(r = k) =nt/D, >, 
j=O 


’ 


(n — j)1j! 
or 
7 = (—1)’| Si-3| 
(5.23) P(r =k) = n!/D, ——_——_. 
: (D. 24 G = Dig! 
Since 
(5.24) Er = coefficient of x” in (d/dt)G(tz’, tx’, ---)| gas 


and 

(5.25) Er(r — 1) = coefficient of x" in (d’/dt’)G(tz’, tx’, ---)| aur, 

then 

(5.26) Er = coefficient of x” in (n!/D,)(e*/(1 — z))(— log (1 — z) — 2), 
(5.27) Er(r — 1) = coefficient of x" in (n!/D,)(e“"/(1 — z)) (log(1 — xz) + 2)’. 
Expanding (5.26) and (5.27) in a power series we obtain 


52 ye MS _ De 
(5.28) Er D.. - a(n ae 8)! > 
- My De 1 
Dy t= (n — 8)! 4252 gk 
(5.29) omer 
nigh Des 51 
D, t= (n — 8)! 52 9(8 — j) 
Since D; = (j!/e) + O(1), 
Fe 
Er~e>- ne Bee = + O(1). 


s(n — 8)! a2 





RANDOM MAPPINGS 


Hence 


(5.30) Er — logn + O(1). 


Similarly 


rl< i< 1 
mee Do Hs | 


o—4 (Nn — 8)! 8 f= 


(n — 8 


(5.31) 


where 7 is Euler’s constant. Hence 


Er(r - 1) ES 2[ hoe » - i -1 -1+7+0(2)] +00, 


and ] 


and 

(5.32) Er(r — 1) = log’ n + 2(y — 1) logn + O(1). 
Thus 

(5.33) a: — (2y — 1) logn + O(1). 


6. Miscellaneous Remarks. The problem of random mappings is of interest in 
various studies of human behaviors. We produce one such example. If we ask each 
of n individuals in a group to name his best friend from among the members of 
the group, the individual asked is the element z, and his choice Tz. In this case 
we have Tz ~ zx, and the hypothesis of “randomness” leads to case II. 


APPENDIX 
Index of Notations Having a Fixed Meaning 


(X, 5, P)-random mapping space 

X-~a finite set of n elements 

3~a set of transformations T of X into X 

P-a probability measure over 3 

Case I-is set of all transformations of X into X, P(T) = n™ for each T ¢ 3 

Case II-is set of all transformations of X into X with Tz # z for each z ¢ X, 
P(T) = (n — 1) for each T ¢ 5 

Case III-is set of all one-to-one mappings of X onto X, P(T) = n!" for each 
Tes 

Case I[V-is set of all one-to-one mappings of X onto X with Tz # z for each 
ze X, P(T) = D;", D, is the nth derangement number for each T ¢ 5 

S,7(z)-the set of all images of z in T. 

P,(x)-the set of predecessors of X in T. 

C,(xz)-the cycle containing z. 

K,(z)-the component containing z. 





1062 BERNARD HARRIS , 
&r(z), s-the number of elements in S,(z). 

pr(x), p-the number of elements in P,7(z). 

ly(x), l-the number of elements in cycle contained in K,7(xr). 
qr(x), g-the number of cyclical elements of X in T. 

ry, r-the number of components of T. 

S‘-Stirling’s Numbers of the First Kind. 


Siejjeo,---k, the subset of 3 with k; components with exactly i elements, 
+= 1,2,--- ,n. 


REFERENCES 


{1} E. T. Bew, ‘‘Exponential polynomials,’’ Ann. Math., Vol. 35 (1934), pp. 258-277. 

(2) Witt1aM Feuer, An Introduction to Probability Theory and its Applications, 1|st ed_., 
John Wiley and Sons, New York, 1950. 

(3) Jay E. Foixerrt, ‘The distribution of the number of components of a random mapping 
function,’’ Unpublished Ph.D. Dissertation, Michigan State University, 1955. 

[4] W. Gontcnarorr, “On the field of combinatory analysis,’’ Bull. de l’Academie des 
Sciences de U.R.S.S., (in Russian, with a French Summary), Serie Mathema- 
tique, Vol. 8 (1944), pp. 1-48. 

[5] R. E. Greenwoop, ‘“‘The number of cycles associated with the elements of a permuta- 
tion group,’’ Amer. Math. Monthly, Vol. 60 (1953), pp. 407-409. 

[6] Cuartes Jorpan, Calculus of Finite Differences, Chelsea Publishing Company, New 
York, 1950. 

[7] Leo Karz, ‘“‘Probability of indecomposability of a random mapping function,’ Ann. 
Math. Stat., Vol. 26 (1955), pp. 512-517. 

[8] Jonn Rrorpan, An Introduction to Combinatorial Analysis, John Wiley and Sons, New 
York, 1958. 


{9} H. Rusin anv R. Sirereaves, ‘Probability distributions related to random transfor- 


mations on a finite set,’’ Tech. Rept. No. 19A, Applied Mathematics and Sta- 
tistics Laboratory, Stanford University, 1954. 





SOME ASYMPTOTIC RESULTS FOR A COVERAGE PROBLEM 


By Max HALPERIN 
Knolls Atomic Power Laboratory 


1. Introduction. A quantity whose distribution is of considerable interest in 
calculation of microscopic behavior of heterogeneous materials is the intercept 
fraction of the phases of the mixture (i.e., the fraction of a linear path intercepted 
by a particular phase). For example, in nuclear reactor theory, one will be in- 
terested in the fraction of a neutron path through a given phase. In this paper, 
we study the statistical behavior of the intercept fraction, for a path of fixed 
length, under the following idealization (more precisely defined in Section 2): 

1. Linear sections of a phase are selected at random and placed on a very long 
line at random, without overlap. 

2. The given path length is placed at random on the long line. 

Some related experimental work has been done [1], with photomicrographs of 
sections on solid Boron Carbide-Zirconium mixtures. In this work a number of 
linear paths of fixed length and parallel to one axis of the photograph were taken 
at positions along the other axis of the photograph and the length of Boron 
Carbide covered by each path was measured. It is interesting to note that the 
frequency of zero fraction intercept predicted by the idealization was in good 
agreement with experimental results. The small differences found were in the 
direction suggested by the fact that the sampled line had to be of finite thick- 
ness. That is, the predicted frequency of zero intercept tended to be slightly 
higher than observed. 


2. Assumptions and Summary of Results. We assume we have a sample of 
line segments A, , 4,,--- , 4, which are independent random drawings from a 
universe of segments with probability density p(A), 0 s A S Aw. Now we 
suppose that the segments A,, A,,---, 4, are placed on the interval (0, L) 
in such a way that all admissible configurations of the segments are equally 
likely and that L = nAy . We call a configuration admissible if 

(a) There is no overlapping among segments. 

(b) No segments overlap zero or L. 

Now we consider a line of length \ < L and place it at random on (0, L) 
with the restriction that no overlap with zero or L occurs. 

Finally we define the intercept (or coverage) of \, AF, as that part of \ covered 
by the segments A, , 4,,---, 4,, and ask for the distribution of \/. In par- 
ticular, we consider the limiting distribution of AF as n — ~, for nu/L = V, 
0 < V < 1, where u is average segment size. 

We find that 


I. lim Pr {AF = 0} = (1 — V) exp — ad, where a = V/(1 — V)u. 


Received December 7, 1959; revised May 27, 1960. 
1063 





MAX HALPERIN 


V am 
tim Pr (QF =a} =~ f(x — »)p(e) de, 4d < de, 
p or 


non 


= 0, ifr = Aw. 


III. For 0 < AF < X, there are a number of distinct continuous contributions 
to the cumulative probability which unfortunately are extremely dependent on 
the specific nature of p( A). Because of the large numbers and complexity of these 
contributions, we defer a more or less detailed listing of this result and only note 
that for the simplest of these contributions, the probability integral is given by 


~o min(AF ,#A a) sa . 
(1-V) > f la(a — 2)] exp — a(\ — z)p,(z) dz, 


s! 


where p,(z) is the s-fold convolution of p(A). 


ou max (A,4 a) 
/ zp(x) dx — [ (x — A)p(z) as} a 


min(A,4 yw) 


IV. EF = V - wt 

ue 

V. It was not feasible to obtain a variance for \F for the asymptotic distri- 
bution. If, however, one further assumes that \ — ~, it is found that, for large 
d, Var AF = wV(1 — V)? [1 + (07/p*)JA, where ~* is the variance of the dis- 
tribution p( A), and is assumed finite. 

VI. As a matter of further interest, the probability distribution of “‘admissible 
configurations” mentioned above is apparently a novel generalization of the 
joint distribution of n independent uniformly distributed random variables. In 
addition to its use in the present problem, one can derive from this distribution, 
a solution to a one dimension nearest neighbor problem for non-infinitesimal one 
dimensional “particles”; this result is at least suggestive of a correction for the 
three dimensional problem when the three dimensional particles are not in- 
finitesimal. Details are given later in the discussion. The nearest neighbor prob- 
lem is of interest in both physics (e.g., Chandresekhar [2]) and biology (e.g., 
Clarke and Evans [{3)). 


3. Joint Distribution of Admissible Configurations. Consider some one of the 
admissible configurations of A,;, 4, --- , 4,. Denote the segment closest to 
zero by 6, , next closest by 6. and so on so that 6; , 6, --- , 6, is some permuta- 
tion of A; , de, --: , 4,, numbered according to segment order on (0, L). Let 
x;(j = 1, 2,---,m) denote the position of the midpoint of 6; on (0, L). Then 
our stipulation of equi-probability of admissible configurations requires that the 
joint probability density of 2, , z.,--- , 2, , say A(x), using a vectorial nota- 
tion, be given by 


(3.1) h(x) = (/ ax) 


where E,, is the domain of possible values of x for all possible vectors 8 (of 
which there are n!) 





A COVERAGE PROBLEM 
For given 5, it is almost obvious that one must have 


a—l 
> i +. St. SL — fa, 
1 


n—2 
dbs + $bn1 S Tar S te — 4(8n-1 + 8), 


j-1 
dos + $b; S tj S 2jas — 4(8; + 8j41), 


45, S ty S 22 — 4(4: + 82). 


One readily finds that, for given 6, integration over the region just defined gives 
a volume (L — > f 4,)"/n! and, since there are n! possible vectors 8, one has 


(3.3) p(x) = 1/(L — SP &)" 


for each admissible configuration. We note that this is a straightforward gen- 
eralization of the joint distribution of n independent variates each uniform on 
(0, L). In addition to being basic to the discussion of this paper, it is of interest 
to note that one can, for this case, obtain the exact distribution of the nearest 
neighbor distance defined by 


(3.4) d, = min {| 2; — 2;| — $(4; + 4;)}j, 


where z, and z; are not ordered but simply denote the position of the midpoints 
of A; and A;. Because this distribution is not directly relevant to the present 
problem, we defer a somewhat more detailed discussion to an appendix. 

From (3.3) and (3.2) we can deduce various marginal distributions.’ In par- 
ticular, we can show that the distribution density for z; , the jth order statistic, 
given that 5; is a particular one of the A’s and that 6 ,--- 8j.4, dj. --+ , b 
correspond to a particular set of A’s, is 


h(2;) | 5) 


j~1 1 © n—j 
(3.5) ' ee («, _ 2» i — ss) (« - oy — 46, . *,) 

~ G-Din-j! ; (L — T.)* 
where T, = >! 4, and 


j-1 
Lia +s S28 L- 2h — 4d; . 


1 We shall use the notation h(z;, , zi, , *** , zs, |), Tr S n, to mean the joint distribu- 
tion of the i,th, --- , i-th order statistics given 4;, , --* , 8:, and also the sets of 4's to the 
left and right of the corresponding order statistics. For brevity we shall also verbally de- 
scribe this distribution as the density of z;, ,--* , 2, given &. 





1066 MAX HALPERIN 


We can also show that the density of z; and z;,,., (j = 1,2,---,n—#s+1; 
8 = 2, 3,---,m) given 6 is given by 


j-1 
(3.6) play, 2j4e-2|8) = Onj(2y — dos — 45;)*" 


j+s—2 n 


lessen — 2p Fe BBs t Bie aL — ede — Hise — Zte a), 
? j+e 


where Doi" & + 46; S 2) S 2jrea — DT * by — 48; + 8540-2), 
j+e—2 n 

X bi + $8j4.4 S tipi SL ds — F544, 

je 
and 6,;, = n!/[(j — 1)! (8 — 2)! (n — 7 — s + 1)!]. From these examples 
it is clear how one can, by analogy with the order statistic distribution of n inde- 
pendent rectangular variates on (0, L), immediately set down any of the multi- 
variate marginal distributions stemming from (3.3). 


4. Outline of Derivation of Distribution of Coverage. Consider first 
Pr {AF = 0}. Denoting by y the position on (0, L) of the midpoint of the line 


of length A, we observe that, for given 5, in order to have zero coverage we must 
have either some one of the following events occurring: 


(4.1a) {aj + 46; Sy— Fr and 241 — $5j41 2 y + 4A}, 


j3=1,2,---,n—1, 
or 


(4.1b) m— 4h 2y +h, 


or 
(4.1¢) tn + 46, Sy — HA. 


We consider this case in some detail, both because of its simplicity and because 
it will serve to exemplify accurately the type if not the amount of tedious solving 
of simultaneous inequalities which appears to be essential in the solution of the 
problem. 


From (4.la) and (3.6), for a specific value of 7, and from our assumptions 
on y, we must have 


max {}A, 2; + $6; + 4A} Sy S min {xj — Bj — HA, L — 4h, 


j—1 


Dd & + 48; S 2; S tj. — 3(8; + 854), 
1 


> & + Fir S tur SL — Da — Byaus. 
1 


3+2 


The bounds on y in (4.2) will evidently always be given by 


(4.2a) a, +h6;+4\ Sy S tar — Bj — 9A. 





A COVERAGE PROBLEM 1067 


However, for (4.2a) to give rise to a non-zero probability, we must have 
(4.2b) i SB Vjar — 9(8; + 8j41) — X. 
Thus, from (4.2), 
j-1 
(4.2¢) Do bn + 48; S ty S Bhar — 4(8; + Bj4y:) — 2, 


which will only give rise to a non-zero probability if 


j 
(4.24) Zjun 2 2 Bn + Bbjar + 2. 


Hence, one can reduce (4.2) to 


ryt hy +4 Sy S Bar — Fyjy — HP, 


j-1 
(4.3) dbs + 4b; S xy S 2j41 — (8; + Bjyx) — 2, 


j n 
> & + MutrA Stn SL 2% — $54: 
T +2 


Now we perform the transformations 


j—-1 


u =2;—- D& — };, 


i 
j 


w= 244 — Di Ben, 


(4.4) 


to reduce (4.3) to 


j j 
u+ Lat Psy svt Pa-h, 


Osusw- i, 
(4.4) applied to (3.6) leads to a joint density for y, u, w, 


n! ue (L—T, —w)”** 
G-tDin—j—1)! Copy: 


Integrating on y according to (4.4) one gets (4.6) multiplied by (w — u — \), 
the important point being that the resultant expression is independent of the 
segment size distribution except for the total length, T, , of the n segments. 
Since each possible 6 has probability 1/n!, it is evident that averaging over the 
allocation of the A’s to the various orders does not change the result so far ob- 
tained. Now, however, we sum on j from 1 to n — 1 to get 


(4.6) 


(4.7) nl (w—u—d)(L ~ T, — w + u)"™ 
"7 (n — 2)! (L — T,)*L 





1068 MAX HALPERIN 


Letting z = w — u — d be a transform on u, and integrating on w and z one 
gets the exact result for given T, , 

te _ (L—T, —)"™" 
(4.8) Pr {AF = 0/T,} = - (L— TL 
Writing L = nu/V, T, = nA we have 


au 
(4.9) Pr {AF = 0} = / {1 — Vd/[n(u — VA)}I"[1 — (VA/u)\gn(A) da, 
0 


where g,(A) is the density of A. It then follows by standard arguments that 
(4.10) lim Pr {AF = 0} = (1 — V) exp — VA/(1 — V)uy. 


Now consider Pr {AF = \}. First we note that if \ 2 Ay, complete coverage 
can only be achieved when two or more of the A’s form a continuous line seg- 
ment of length 2 and cover \; but from the continuity of the distribution of 
segment midpoint coordinates, the probability of two or more of the A’s forming 
a continuous line segment is zero. Thus we need only consider Pr {AF = X} if 
\ < Ay. In such case, one must have for \F = X, one of the disjoint events 


(4.11) {zy — 46; S y — $A; 2; + 48; 2 y + 4A, j =1,2,-++,n. 


Taking into account (3.5) and the requirements on the limits for a non-zero 
probability, (4.11) becomes, for a particular /, 

aj— 3;+ 9 Sy S 2; + 38; — DD, 
(4.11a) jt " 

Di + i; S2;SL—- Da-3;, A<8; S dw. 

1 j+1 
Integrating the joint density of y, z;, 6; over the indicated limits and using 
L = ny/V, one gets for a given j and any permutation of the A’s 


v s** 
(4.11b) Pr (wr =alj}=“ f° (e—»)ple) ar, 
en ¥>r 


independently of 7. Thus the result cited earlier follows immediately. 

Now we go on to discuss briefly the work involved in computing other con- 
tributions to the cumulative probability function of \F. If we ask for Pr {AF < C} 
we can distinguish, aside from the cases already considered, three distinct cases: 

A. s of the segments lie completely within (y — 4A, y + 3A); n — 8 of the 
segments lie completely outside (y — 4A, y + 4A); the segment farthest to the 
left on (0, L) and still wholly within (y — 4A, y + 4A) may be the jth segment 
on (0, L) counting from zero; j = 1, 2, ---,n—s+1;8=1, 2, ---,n. 
For a given s and j and permutation and position of the A’s it is evident that 
\F = 5°3*** 4, and one can verify that, in addition to the restrictions imposed 
by the distribution of x and y, one must also satisfy, except for 7 = lors = n, 
max {25-1 + 4A + 465-1, Zi4ea — 4A + $654.4} 

j+e-—1 


S y S min {xj + 9A — 48;, 2j40 — FA — B54, 0s > & SC. 
1 





A COVERAGE PROBLEM 1069 


For the exceptional case where j7 = 1 but s # n, the inequalities to be satisfied 
become, for given s, 


t.— 4X + $6 S y S min {x + 4A — 48, Der — BA — Ful, Os Lh SC, 


and forj = l,s = n, 
%—-}\ +%,SySun+h-}f, OF Los C. 


By an analysis similar to but somewhat more arduous than the cases already 
considered, one obtains as the eract probability, for this case, that AF s C, 


1 8 fee f- (" (x— 2)" (L-v- ge 


(4.12) Lit Je (L-z-y* 


P(X) Pay) dy dz, 


where p,(z), Pa.(y) are s- and (n — s)-fold convolutions respectively of the 
segment size distribution. Since z + y = }-f A, we transform (4.12) to a sum 
of integrals over the joint distribution of z and standardized z + y. An appro- 
priate asymptotic argument gives, as cited in the summary, 


2 min(C iy) ET 
(4.13) (1-—V) yf ase =)l exp — a(\ — z)p,(x) dz. 


B. s — 1 of the segments lie completely within (y — 4A, y + 4A); one seg- 
ment lies partially within (y — 4A, y + 4A) and is the segment farthest to the 
left on (0, L) and still having an interval in common with (y — 4A, y + 4A); 
n — s segments lie completely outside (y — 4A, y + 4A). The partjally covered 
segment may be the jth segment on (0, L) counting from zero; j = 1, 2, - 
n—s+1;s = 1,2,-+-,m. Fora given s and j and permutation and position 
of the A’s, one can show that, for admissible positions, 

j+e-1 


MP o= 2; — y + (A+ 8;) + 2 bs 
’ 


and verify that, in addition to the restriction imposed by the distribution of x 
and y one must also satisfy, for AF s C, 


jt+e-1 


ry -C +5 (448) + 2 dy 


2 +5 (+8) | 
max 2 +5 (0-4) Ss y S min< ( 
Lise — 5A + Bye) 


1 
Lj+e-1 —_ \ — 2 (540-1) 


It is clear that there are again exceptional values of j and s for which the above 
inequality must be slightly modified. We omit a detailed consideration. Further, 
by symmetry, it is clear that there will be an identical contribution to 





1070 MAX HALPERIN 


Pr {AF s C} when the single segment partially covered is the segment farthest 
to the right on (0,2) and still having an interval in common with 
(y — 4A, y + 4A). By increasingly tedious analysis of the relevant inequalities 
one can show that’ except for terms of O(n™) the contributions to Pr {AF < C} 
for case B are given by 


nw (n—1\ [°° pp? (L-vA-—2-w) 
cap SHEVIG CEES 
‘(A —2—y)""(L —X— 2 — w— 2)" "p(w, 2, y) dz dw dy dx 


and 


(4.14b)* T > . wh ie a: ee oe 


‘(A—2-y)" (L—’—2 — w —2z)” "p(w, z, y) dz dw dy dz, 


where p(w, z,¥) = Pa-.(w)p,.(y) p(x) and, as before, p,(x) is the r-fold con- 
volution of p(x). Again noting z + y + w = )-} Ay we can transform (4.14a) 
and (4.14b) to a series of integrals over the joint distribution of z, z, y, and 
standardized z + y + w. From an asymptotic argument whose details we omit, 
one obtains 


—z rt ee _ -y cote z)\"" 
(4.15a)° om rel f: (s—1)! | 


‘exp — a(\ — y — z)piuly) p(x) dz dy dz 


and 


(4.15b)" am Sek ct. maa 2)" 


‘exp — a(A — y — z)pialy) p(x) dz dy dz. 


C. (s — 2) of the segments lie completely within (y — 4A, y + 4A); two 
segments lie partially within (y — 4A, y + 4A); » — 8 segments are completely 
outside (y — 4A, y + 4A). The partially covered segment on the left of 
(y — 4\,y + 4A) may be the jth segment on (0,Z) counting from zero; 
p= 1,2,---,n—8+1;8 = 2,3,---,n 

For a given s and j and permutation and admissible position of the A’s, one 
can show that 


j+e-—2 


MF = 2; — Sjpea tA + 43(8; + Biren) + YO & 


j+1 


? For notational simplicity we have written upper limits to z and y as C and C — z, 
respectively; they should, of course, be min(C, Ay) and min[{[C — z, (s — 1)Ayw respectively. 
* Remarks similar to footnote reference two apply here. 





A COVERAGE PROBLEM 1071 


and verify that, in addition to the restrictions imposed by the distribution of x 
and y, one must also satisfy, for AF < C, 


max {2; + 4(A — 8;), Zire — BCA + 8j4e-4)} 
Sy S min {zj + 4(A + 85), Zip — HA — 8y404)} 


j+e-—2 
2j SC + Zj4ea — (8; + 8j4e4) — 2 by — 2. 
There are again exceptional values of j and s for which the above inequalities 
must be modified. We omit a detailed consideration. 

By analyses of the type outlined above we can derive expressions for con- 
tributions to Pr {AF s C} of essentially the type already obtained for previous 
cases. Unfortunately, this last case involves a rather large number of distinct 
sub-cases; for brevity we give a typical result for this case, 


7 3 af ya. [a(A be atl 


‘exp — az}p,2(y)p(x2)p(a) dz dy dx, dx; ; 


if Aw S C S 4/2. Other contributions for this case differ only in limits of inte- 
gration and values of C for which they are applicable. 

We close this section by remarking that one could in the results above replace 
the convolution densities with appropriate Fourier integrals; reasonable speci- 
fications of p(x) (e.g., that it be representable as a polynomial in z) would then 
allow all necessary integration to proceed in a straightforward manner. How- 
ever, it seems unlikely that such a computation would result in a useful repre- 
sentation and therefore no such computation has been attempted. 

One might also mention that the Fourier integral representation allows sum- 
mation of all of the infinite series obtained above. The resultant integrals, how- 
ever, do not appear tractable even under specific assumptions about the nature 
of p(z). 


5. Mean and Variance of Coverage. Although the exact limiting distribution 
of AF, has been given in Section 4, it has not been possible to derive useful forms 
for the variance of AF using this distribution result, except as described later in 
this section under additional restrictions on the distribution. 

One can, however, compute the asymptotic (n — «, nu/L — V) expected 
value of AF using the obvious fact that AF is actually a sum of n (non-independ- 
ent) random variables each with the same expectation. Thus if we denote the 
portion of a segment of length A; which is covered by (y — 4A, y + 4A) by C,, 
then EXF = >-3 EC;. To compute EC,, it is necessary to consider two cases, 

A. The segment is completely covered. 

B. The segment is partially covered. 

In case A, the particular segment can be the “‘j’’th segment, j = 1, 2, 





1072 MAX HALPERIN 


in order on (0, L) and for any permutation of the A’s and particular j, the co- 
ordinate of the midpoint of the segment must satisfy 
y — 4A — 6;) Sz; Sy + H(A — 8;) 


in addition to the requirements imposed by the distribution of x and y. Again a 
reduction of the relevant inequalities leads to a contribution to EAF, neglecting 
terms of O(n"), 


V a 
~ | z(\ — xz) p(z) dz. 


For case B, for particular j and permutation of the A’s, the additional require- 
ment on 2; is [if the particle covers the left boundary of (y — 4A, y + 4A)] 
y— A+ 6;) S 2; Sy — HA — 45). 
A reduction of the relevant inequalities leads to a contribution for this case 
(which, from symmetry, we double) 


r » 
i. ; z'p(z) dz. 


Thus one has, neglecting terms of O(n™"), 
min(A,4 yw) 
EM = wr f zp(z) dz. 


However if \ < Ay , it follows from (4.11b) that one has an additional contribu- 
tion to EXF given by 


AV ri (z — d)p(s) ds. 
uw oy 


Hence one can write 


pr =v YS 


ub 


min (A, 4) 


er xrp(z) dx — Sepa — r)p(z) az}. 


Unfortunately, it does not appear possible to calculate the variance in the man- 
ner suggested by the preceding. Once again, in such a calculation, one encoun- 
ters integrals of the type derived in Section 4. This suggests that the variance 
of AF depends on p(x) in a quite complex way. This suggestion gains further 
credence if one considers, as suggested at the close of Section 4, representing the 
convolution densities by appropriate Fourier transforms. One can then see quite 
readily that for quite general p(x) both the density and moments of \F involve 
not only 


au ; 
M(a) = / exp arp(x) dz, 
0 


but derivatives thereof as well as related functions. 





A COVERAGE PROBLEM 1073 


One can, however, obtain a variance for \ large (or more properly the vari- 
ance of the asymptotic distribution for \ large) from the following considera- 
tions: 

It is trivially true that E(AF)* 2 d*V*. We also have E(\F)’ S BAF = 2’V. 
Hence, for some #, 0 < & < 1, one has Var AF = #)’V(1 — V). Note that 
@ may be a function of \, V, or the segment size distribution, p(z). Now we 
consider that part of the density of \¥ arising from (4.13); one has, forC S sAyw , 


(5.1) (1 — V) la(A ~ COT oe — a(r’ — C)p,(C). 


8! 
In (5.1) we make the following transformations: 
5 a = ro] /|ra-9 
oe ' E wi —V)I/ La — V) 
and 


(5.2b) z= (C — AV)/oVv(i — V)}’. 


(5.2a) suggests itself because of the Poisson factor in (5.1); (5.2b) is of course 
the natural standardization for C. If one goes through the details of an asymp- 
totic argument for fixed z and ¢ we find that for \ + © we can write (5.1) as 


r\2 r\t 22 + 2 


Qn o V 


2 2V(1 — V)e? o 


providing @ = [(u/Vd)]'¢, where ¢ is a constant independent of \. Integrating 
on t one finds that z will be asymptotically N(0, 1) providing 


f= w(l — V)(L + (0°/u’)] 
caerenmens commeanercak 


Thus one has as the variance of AF, for large A, 
(5.3) Var AF = wV(1 — V)*[l + (07/p’)). 


It is easy to show that (5.3) also holds for \—+ © and V -+0 but VA/y approach- 
ing a constant; this result follows immediately upon appropriate substitutions 
in the various integrals and going to the limits on \ and V. 

Also note that (5.2) has an “extraneous” factor (after integration on ¢) of 
(1 — V)*. This is simply the probability, for large 4, that one has non-zero 
coverage and that all segments which are covered at all are completely covered. 
A similar computation for (4.15a) and (4.15b) shows that the asymptotic 
probability of non-zero coverage and partial coverage for one of the two end 
segments on (y — $A, y + 4A) is 2V(1 — V). By subtraction, since lim.» 
Pr {AF = 0} = 0 and lim... Pr {AF = 1} = 0, the probability of non-zero cover- 
age and partial coverage of both of the two end segments is V’. 





1074 MAX HALPERIN 


One can verify that the argument we have given depends on the fact that 
ViA/u — © and \ — «. For VA/p — @© and yw — 0, we can show, assuming 
(5.3) holds and o* < ky’ where k is a constant independent of yu, that the cover- 
age converges in probability to AV. Also note that as \ — 0, the distribution of 
coverage approaches a two point distribution, 


(5.4) lim,.o Pr {AF = 0} = 1—V; _ limy.o Pr{AF = dj = V. 


APPENDIX 
A Nearest Neighbor Problem 


The distribution of Section 3 may be used to formulate and solve a nearest 
neighbor problem which is at least suggestive of a correction to the same 
problem in three dimensions when particle size and volume % are not 
vanishingly small. 

In the simplest version of the one-dimensional problem one assumes a set of 
values 2; ,2%2,°**,2%, randomly and independently selected on 0 < z Ss L, 
and asks for the distribution of 


(A.1) d, = min | 2; — L/2|. 


It is then easy to show that the cumulative distribution function of d, is given 
by 


(A.2) F(a) =1— [1 — (2d/L)]", O8d = 1/2. 


The problem is essentially unchanged if we define the nearest neighbor dis- 
tance as 


(A.3) d, = min | z; — z;| for some fixed 7. 
j 


It can be shown that the cumulative of d, is 
F(d,) = 1 — [1 — (2d,/L)]" + 2{[1 — (d2/L)}" — (1 — (2d,/L)"}/n 
(A.4) if0 sd, = L/2, 
1 — (2/n)[1 — (d,/L)]", (L/2) Sa = L, 


which reduces to (A.2) for n large. 

The generalization of the above, based on Section 3, is to consider a set of 
segments A,, 42, --- , An( >.: 4; < L) with midpoint coordinates on (0, L) 
21,22, °** , Zn. We then define the nearest neighbor distance analogous to (A.3) 
as 


(A.5) d; = min {| z; — z;| — 4(4,; + A,)} for some fixed 7. 


7 





A COVERAGE PROBLEM 


It can be shown that 
sis sean einai ok My ota: 
as (: a st) * =[( r xt) 


“(ata 
(1 — @)L 
f0 sd 3 (1 — a)L/2 


2 ds " (—a)L 
-1-2(1-_ 42) Ose casu- an, 


where a = >} A;,/L. From (A.6) one finds the expected nearest neighbor dis- 
tance is given by 


(A.7) (1 — a)L/2(n + 1)){1 + (2/n)], 


which differs from the analogous result based on (A.2) by the factor 
(1 — @)[{1 + (2/n)] and from the result based on (A.4) by the factor (1 — a). 
This suggests that as a plausible approximation for the three dimensional ex- 
pected nearest neighbor distance [usually computed in generalization of (A.2)] 
one should use 


3(1 — a)p | (1/3) 
(A.8) |* - ne | — 


where p = V/n, V is the volume under consideration, a is the volume fraction 
taken up by particles, and n is the total number of particles. 

The result (A.6) is obtained by noting that z; may be any one of the n order 
statistics on (0, L) so that if z,; is the rth order statistic the nearest neighbor seg- 
ment will correspond either to the (r — 1)th or (r + 1)th order statistic. This 
leads to a tedious analysis of inequalities along the lines indicated in Section 4. 
Fortunately, it turns out to be unnecessary to pursue the analysis in complete 
detail since one finds that F(d,;) depends only on (1 — a)L and n. Since for 
a = 0 one must obtain (A.4) and since the coupling of (1 — a) prevents any 
change in form of the distribution for variation in a, it follows that F(d,) is 
identical with F(d,) with (1 — a)L replacing L. 

We also note that for (A.6) one has as the sth moment, ignoring terms of 
order higher than O(n~’), 


(A9) ga, = 2 — a) ETC + DP(n + 1) 


2T(n + 8 + 1) 
In particular 


_U-a)'Tn U1 -a)'T 
4(n + 1)*(n + 2) 4n? : 


If one assumes that the A’s have a distribution and nu/L = a 


Var d; = 


Ed, = ((1 — a)u)/2a; Vard, = ((1 — a)*y*)/4a’. 





1076 MAX HALPERIN 


There are a large number of further statistical problems that one could de- 
fine based on the distribution of Section 3. One that seems worth mentioning in 
closing is the distribution of a randomly selected inter-segment distance (includ- 
ing distance from zero to Ist segment and distance from last segment to L). 
It is easy to show, denoting inter-segment distance by § and probability density 
of § by A(8) that asn — ~, 


(A.10) = -h(8) = a/(u(1 — @)) exp —a8/(u(1 — a))0 S38 < @. 
Thus £3 = (1 — a)p/a, Var § = ((1 — a)*y’)/a’. 
Note that (A.10) is completely analogous to the solution of a similar problem 


for n independently distributed rectangular variates on (0, L). The only dif- 
ference in (A.10) is the nature of the constant. 


REFERENCE 
{1] Ranpauu, C. H., ‘‘Microscopic Effects in Multiphase Mediums,’’ Knolls Atomic Power 
Laboratory, Memorandum, KAPL-M-CHR-2. 
(2] Caanpresexnar, S., “Stochastic problem in physics and astronomy,” Revs. of Modern 
Physics, Vol. 15 (1943), pp. 86-87. 
[3] Cuarke, P. J. ann Evans, F. C., “On some aspects of spatial patterns in biological 
populations,” Science, Vol. 121 (1955), pp. 397-398. 





STATISTICAL PROGRAMMING' 


By D. F. Voraw, Jr. 
Yale University* 


1. Introduction. A “statistical programming” problem is encountered when 
the information about one or more constants in a programming problem is 
statistical. We shall first give examples of programming problems and then point 
out how certain statistical analogues of them arise. The results given in this — 
paper pertain to these analogues. 

Our first example is a transportation problem. Given a unit amount of a homo- 
geneous product (e.g., oil) at each of n origins and required that a unit amount 
be received at each of n destinations, and given the cost, say c;;, of shipping a 
unit amount from the ith origin to the jth destination (i,j = 1, --- ,n;n 2 2), 
find a most economical schedule of shipments of the product from origins to 
destinations. More specifically, find an n < n matrix (2,;) of real numbers for 
which 


(1) Deva 


assumes its minimum value, where 


ty = 1, (j=1,---,n), 
t= 


> 25 = 1, (¢ = 1,---,m), 
j=l 


Taj 2 0. 


24; represents the amount shipped from the ith origin to the jth destination; 
and the matrix (z,;) is called a “program.” The expression in (1) is the total 
shipping cost. The condition (2) expresses the facts that at each origin the sum 
of all amounts shipped away must equal 1 and that at each destination the sum of 
all amounts received from the origins must equal 1. The problem stated above 
is a special case of the Hitchcock-Koopmans transportation problem, which is a 
well-known special case of a linear programming problem (see [1], [2, Part 1]}). 

The next example is the personnel assignment problem, which is closely re- 
lated to the first example (see [4], [5, pp. 255-258], and [8]). Let us replace 
“origins” by “persons,” “destinations” by “jobs,”’ and regard c,; as the produc- 


Received June 9, 1950; revised May 23, 1960. 

? Research work sponsored in part by the Office of Naval Research. This paper includes 
some results in an unpublished paper of the same title presented at the Montreal meeting 
of the Institute cf Mathematical Statistics, September, 1954 (see [6}). 

? Now at MITRE Corporation. 


1077 





1078 D. F. VOTAW, JR. 


placed full time on a job and that each job be filled: thus to the linear constraints 
in (2) on (x;;) there must be added the following constraint, which is not linear: 


(3) each z;; = 0 or 1. 


In view of (3) the admissible program matrices (z,;) are simply the n! permuta- 
tion matrices of order n x n. The problem is to find a permutation (j;, --- , jn) 
such that 


(4) Cy; Fee + oy = ae (C14, + °+* 7 Cnjn) 

seen 
where (j; , --- , jn) denotes any permutation of (1, --- , n). Having determined 
(ji, -° , ja), We would assign person 1 to job ji, «-- , person n to job j, and 
thereby obtain maximum average productivity of the group of n persons relative 
to the jobs. Incidentally, when (3) does not hold, we may regard z,; as the frac- 
tion of the ith person’s time allocated to the jth job. 

A sum of the form o;, + --:* + ¢n;, will be termed a “permutation sum of 
¢;;’3.”” From [4, Lemma 2] we have that each program in the first example is a 
weighted mean of the n X n permutation matrices and that the sum in (1) is a 
weighted mean of permutation sums of c;,;’s. This implies that there is a permuta- 
tion matrix (z,;;) for which the quantity in (1) assumes its minimum value. A 
similar result holds relative to the maximum value. Clearly, the transportation 
problem described in (1) and (2) and the assignment problem are nearly identi- 
cal mathematically. For each problem the optimum sum is assumed when 
(z;;) is a permutation matrix. The differences between the two are: (i) in one 
we seek a maximum sum and in the other a minimum sum; (ii) the integer con- 
straint (3) is part of the assignment problem but not part of the transportation 
problem. It is noteworthy that both problems are related to a certain 0-sum, 
2-person game (see [4], pp. 7-11). 

The c,;’s are constants in the two examples stated above, and it is presupposed 
that the numerical value of each is known to the “programmer.” When the 
values are in fact not all known to the programmer, statistical information 
regarding them may nevertheless be available (e.g., in the form of aptitude 
indexes of personnel (see [7]) or in the form of records of unit costs of past ship- 
ments). Such situations lead to the problems treated in this paper. As regards 
the game discussed in [4] it should be noted that a “pseudogame” arises when 
the c,;’s are not all known to both players. For a discussion of pseudo-games see 
(3, p. 357]. 

We shall set up a statistical analogue of the personnel assignment problem. 
By simply replacing “maximization” by “minimization,” we can transform the 
analogue into an analogue of the transportation problem. Both these analogues 
are related to pseudo-games based on the game in [4]. 

A generalization of the assignment problem and of the statistical analogue will 
be considered in Section 4. 


2. Statistical Programming. Consider an n’-dimensional Euclidean sample 
space, W*, and represent each point of W* by (wy, wn, --* , Wan). Let H(wn , 





STATISTICAL PROGRAMMING 1079 


Wi, *** , Wan) be the distribution function associated with W* and let (Wy , 
Wis, °°: , Wan) be a random n’-dimensional vector whose distribution is H. 

We assume that the c,,’s are known to be parameters of H; however, the 
numerical values of the c,;’s are assumed to be unknown to the assigner. We also 
assume that an observed value, say (wi, Wis, °° , Wan), of (Wu, Wi, «++ » Wan) 
can be obtained by him. In general the lla value supplies him with sta- 
tistical information regarding cy , Cie , -** , Can - a Pe obtained (wi, Wis, **- 
w,.) the assigner selects a permutation of (1, --+,m). 

With reference to the assignment problem we define statistical programming 
as partitioning the sample space W* into n! mutually exclusive and exhaustive 
subsets and establishing a one-to-one correspondence between the subsets and 
the n! permutations of (1, --- , ). * It is understood that when the observation 
(wir, Wi, “** 5 Wan) is obtained one selects the permutation ji, *** 5 ja corre- 
sponding to the subset, say P,;....;:, in which (wy, wis, «°°, Wan) lies. In 
advance of obtaining an observed value of (Wy, Wu, --- , Wan) the permuta- 
tion to be selected is a random variable, say (J, , --- , J,), and so the permuta- 
tion sum cy, + «+: + cx, is a random variable. We can regard the distribution 
of this random sum as a “performance function” characterizing the statistical 
programming with which it is associated. 

A natural kind of statistical programming is that in which one considers 
estimates of the n! permutation sums and selects the permutation corresponding 
to the largest estimate. This will be termed “programming by estimation.” A 
permutation sum of w,,’s would often be a suitable point estimate of the corre- 
sponding permutation sum of c¢;,’s. This leads to a set of n! regions P;;.... 
such that any given P;;.... ;, would contain the set of points in W* such cat 
(5) Wij, + +s + Way, > max (wij, + ++ + Way,)- 

Caer + edn) GL + sda) 
Incidentally, formula (5) does not assign points of W* having two or more 
equal permutation sums. We shall assume that H(wy , wz, --- , Wan) is continu- 
ous; consequently, such points can be assigned to the subsets arbitrarily. 


3. Lene Pits a cad antinecanai tat eee 
tion. Let (Ji, --- , Js) bea purely random permutation (ie., for any eet 
permutation (j,, «+ , jn) the probability that (JT, ---, Js) = (ji, «-* 5 dn) 
is 1/n!). Let 
(6) S =cut +--+ + ent, 
and let 
(7) F(s) = Pr(S < 8) (—2 <8< @), 


The possible values of S are the values of permutation sums of the ¢,,’s; hence 
F(s) is a purely discrete distribution having not more than n! saltuses. Let d 


* This definition can be generalized to provide a definition of statistical programming 
that is associated with general linear programming. In this paper, however, the generaliza- 
tion will not be carried out. 





1080 D. F. VOTAW, JR. 


represent the number of distinct possible values of S (thus 1 S d s n!). When 
d = 1, let s, be the one value. When d > 1, let the values be represented by 
8, , -** , &¢in increasing order of magnitude. The selection of a value of (J{ , -- - 
J%) will be termed “purely random programming.” 

For each (i, 7) let 


(8) Yous = Wij — C4, 


and represent the sample space of Y;;’s by Y*. Let (yu, yw, --* , Yan) be any 
point of Y* and let the distribution function of (Yu, Yu, «+: , Yan) be K(yu, 
Yi2» *** » Yan). We shall assume that K is completely symmetric in its variables 
and continuous. These assumptions imply that (for d > 1) 


(9) F(s) = Pr((Yu, Yu, ---, Yun) € Ril, (% 8 & 84) 


where R, is the set of points in Y* such that 


(10) (Yrsre °°: +} Yning) > (Yrsre: 7 ooo > Ying’) 


Cree ion fag?o foe’ 
where (ji,,, «°° » jn.) Tepresents a permutation of (1, --- , ) such that 
(11) Cig + ri + Cnrjng = q(q sae se » 84). 


Let (Ji, --+ , Ja) be a permutation to be selected under programming by 
estimation with an observed value of (Wy, Wu, --- , Wan) as in (5). Let 


(12) Z=ty, t+: + ea 
and let G(s) be the cumulative distribution function of Z, thus 
(13) G(s) = Pr(Z S 8), (—-~x» <8< +0). 


G(s) has the same saltus points as f(s) — namely, s = s,, --- , 8. It follows 
from (12) and (13) that (for d > 1) 


(14) G(s) = Pr{(Wu, Wa, --- , Wan) € Ri}, (-2 <s<+2) 
where R; is a region in W* such that 


; ees nde. > ao eee nt, 5 
— q>e 


Clearly we have that 
(16) G(s) = Pr{(Yu, Yi, ~~: , Yan) € Re}, 
where R? is the region in Y* such that 

max (yj. + °°* + Yning + @) 


Ure? *sdag) 
(17) ' 
> max (vajre’ + °°* Heine + Y)- 


CGitig? ett dmg 
@>e 





STATISTICAL PROGRAMMING 1081 


G(s) is the performance function for programming by estimation and F(s) 
is the performance function for purely random programming. Under certain 
conditions G(s) and F(s) can be compared by means of the theorem below. 


Tueorem: If K(yu, va, *** » Yan) 18 completely symmetric in its variables 
and continuous, then for every s in (8 S 8 S 8a) 


(18) G(s) S F(s). 


If, furthermore, every non-degenerate interval in Y* contains positive probability, 
and if d > 1, then for every s in the interval (8, S 8s < 84) 


(19) G(s) < F(s). 


Proor: When d = 1, formula (18) is obviously correct. When d > 1, formula 
(18) is correct when s = 8, since both G and F equal 1. When d > | and s is 
any point of (s, S s < se), we have from (9), (10), (16), and (17) that RY 
is a subset of R, since for each s the quantity g’ — q is non-negative. This com- 
pletes the proof of (18). To prove (19) we shall show that for every s in (s; S 
8 < 84) there is an n’-dimensional interval in R, — Ri . Let t = min (84; — &) 
(o = 1, --- ,d — 1), and let r be in the interval (0 < r < t/(m + 1)). For any 
given sin (8; S 8 < 8) and for some q S s consider the following n’-dimensional 
interval in Y* 


t/n — r/n < ying <t/n (qs s)(t¢=1,---,n), 
0< yi <r/n (i,j = 1,-++,n; I # jia)- 


It can be shown easily that throughout this interval the maximum permutation 
sum of y;;’8 i8 yis,, + *** + Ynig.g - Thus the interval lies in R, (see (10)). It 
will now be shown that the interval does not lie in RY . Note that for each point 
of the interval 


(20) 


Wy + --> + Yoj.. + 8 > & 
lity sb 


but 
max (Yr, ** + ming +9) < t+ (ta — t) = 80; (# & 8 < 4) 


Giger ** Sag) 
Oss om 


hence no point of the interval lies in RY (see (17)). We have thus shown that the 
interval lies in R, — R? . It follows that the probability content of R, exceeds 
that of RY ; hence for each s in (s; S s < sz) we have that G(s) < F(s). 


Let E(S) and E(Z) denote the expected value of S and of Z, respectively. 
It can be shown easily that 


(21) E(S) = » c4;/n. 


It should be noted that when (18) holds, E(Z) 2 E(S), and that when (19) 
holds, F(Z) > E(S). An interesting feature of the statistical analogue of the 
transportation problem is that we can be certain of attaining the sum given in 
(21). This ean be accomplished by setting every z,; equal to 1/n. 





1082 D. F. VOTAW, JR. 


When the W,,’s are mutually independent and each (W,; — c;;) has the same 
continuous distribution, K(yu , yi, --* , Yan) satisfies the conditions that imply 
(18). When Wu, Wie, --:, Wan are normal and independent with means 
Cu, Cz, *** » Cnn, Tespectively, and a common variance, K(yy,, Yi2, --+ Yan) 
satisfies the conditions that imply (19). It should be noted that the theorem does 
not require that the W,,’s be independent. 

When (18) holds, programming by estimation is uniformly at least as good as 
purely random programming; when (19) holds, programming by estimation is 
uniformly better. 

It is easy to find situations in which random programming is better than pro- 
gramming by estimation. For example, let W,; have a negative exponential dis- 
tribution, (1/c;;)e~"4/e,;, which has mean c,;, and suppose that n = 2 and 
that Wy, Wi, Wa, and W» are mutually independent. When cy, = 20, ¢» = 
Cx = 10, and cx = 1, we find that with programming by estimation the expected 
sum of ¢;;’s is 20.467, approximately. This is less than 20.5, which is the expected 
sum of c;;’s when random programming is used. We can also find situations in 
which the hypothesis of the above theorem is not fulfilled but the conclusion 
holds. An example of this arises when the four W;,;’s described above are asso- 
ciated with the following c;;’8: cy, = 9, Ci = 7,¢n = 8, cx» = 5. Here the expected 
sum of ¢,;’s is 14.531, approximately, when programming by estimation is used. 
This exceeds the expected sum, 14.5, when random programming is used. 


4. A Statistical Analogue of the Generalized Optimum Assignment Problem. 
Consider a B-dimensional array (B 2 2) having n “layers’’ in each dimension 
and let c;,;,...i, be the element in “cell” (4; , #2, --- ,is)(% = 1,-°-:,n;b = 1, 
--- , B). As pointed out in [4, p. 11], this situation could be of interest when each 
of n jobs requires a team of B — 1 persons and ¢,,;,...;, represents, say, the pro- 
ductivity of the team consisting of persons i; , --- , tg-; on job i, . Other inter- 
pretations can be made easily. The problem here is to find an “assignment set” 
for which the “assignment sum” of c’s equals its maximum value. This is the 
generalized optimum assignment problem. When B = 2, the problem is the 
personnel assignment problem stated in section 1. 

By straightforward generalization of our statistical analogue of the personnel 
assignment problem we obtain a statistical analogue of the generalized assign- 
ment problem. The theorem stated in Section 3 generalizes to this statistical 
analogue. 

With regard to the generalized assignment problem, it can be shown that 
under purely random assignment the expected total production equals 
(22) ¢ 4 Sine anl., 

tytae+* tp 
REFERENCES 


{i] G. B. Danrzte, “Application of the simplex method to a transportation problem”’ 
(chapter XXIII in [2}). 

(2] Tsatuine C. Koopmans (ed.), Activity Analysis of Production and Allocation, Cowles 
Commission Monograph No. 13, John Wiley and Sons, New York, 1951. 





STATISTICAL PROGRAMMING 1083 


j3} J. C. C. McKinsey, Introduction to the Theory of Games, McGraw-Hill Book Company, 
New York, 1952. 

[4] Joun von NeuMANN, “A certain zero-sum two-person game equivalent to the optimal 
assignment problem” (in Contributions to the Theory of Games, Vol. II edited by 
H. W. Kuhn and A. W. Tucker, Princeton University Press, 1953, pp. 5-12). 

[5] D. F. Voraw, Jr., ‘Methods of solving some personnel classification problems,’’ Psy- 
chometrika, Vol. 17 (1952), pp. 255-266. 

[6] D. F. Voraw, Jr., ‘Statistical programming,’ (abstract), Ann. Math. Stat., Vol. 25 
(1954), p. 809. 

[7] Davin F. Voraw, Jr., ‘‘“Mathematical programming and personnel assignment,’’ in 
The RAND Symposium on Mathematical Programming, R-351, The RAND 
Corporation, Santa Monica, California, 1960, pp. 119-120. 

[8] D. F. Voraw, Jr., anp A. Orpen, ‘‘The personnel assignment problem”’ in Symposium 
on Linear Inequalities and Programming (Planning Research Division, Comp- 
troller, Headquarters U. 8. Air Force, Washington 25, D. C. 1 April 1952, pp. 
155-163. 





THE LIFE DISTRIBUTION AND RELIABILITY OF A SYSTEM 
WITH SPARE COMPONENTS' 


By Donatp F. Morrison’ anp H. A. Davip 


National Institute of Mental Health, Bethesda, Md. and 
Virginia Polytechnic Institute 


0. Summary. The distribution of the operating life of a series system of like 
elements supplied with a set of spare components has been obtained for the 
situation when failed elements are in turn replaced by spares. This distribution 
has been evaluated for some common types of component life densities, and 
tables of expected total system life have been constructed. These expectations 
have been compared with those of systems with no spares as a measure of the 
efficacy of the additional spare components. The reliability of systems with 
spares has also been studied. 


1. Introduction. Consider a system which is made up of several components in 
such a way that failure of any component causes the system to fail. With the 
system i° associated a fixed number of spare elements. System failures are cor- 
rected by successively replacing failed elements from this store until it is empty. 
Upon this ‘‘final failure’ the entire aggregate is discarded. 

A convenient example of a system with two components and a single spare is 
provided by the sale of identical nylon stockings in triplets rather than pairs. 


Similarly, the four original tires and single spare of an automobile form such a 
system. In both of these examples, however, protection afforded by the extra 
elements is directed against accidental failures (runs in a relatively new stocking 
caused by abrasion with a piece of furniture, or punctures in an otherwise solid 
tire due to hazards distributed randomly along the highway). The replacement 
scheme with only a small number of spares offers substantially less protection 
against system failures due to fatigue or wear, for when one element has failed 
from these causes, its mates are usually worn as well, and a fresh component will 
provide relatively little increase in system life. 

Component life will be taken to be a positive random variable with density 
function f(z) and absolutely continuous cumulative distribution function F(z). 
Independence of the lives of all components and spares will be assumed. We 
shall write the total operating life of a system of n components and k spares as 
L(n, k); clearly L(n, 0) is the first order statistic from a sample of n independent 
component lives. 

Cox [4] has recently considered systems with spare components as constituting 
a problem in renewal theory. From a result of Cox and Smith [5], the approxi- 


Received November 16, 1959; revised April 15, 1960. 

1 Work supported, in part, by the Office of Ordnance Research, U. 8. Army, Contract 
No. DA-36-034-ORD-1527 RD. 

? On training leave at Virginia Polytechnic Institute, 1959-1960. 


1084 





RELIABILITY WITH SPARE COMPONENTS 1085 


mate expected system life may be obtained when the number of spare elements 
is large relative to the number required in the system. Other methods of renewal 
theory yield the exact expectations of total life for small systems with simple 
component life densities. 

Black and Proschan [3], [11] have also investigated an aspect of a more general 
form of the spare components problem. Their system is a collection of subsystems, 
each with a different form of component life density. Spares of each component 
type are provided, and system operation continues in the usual manner until a 
failure occurs in a subsystem whose store of spares has been exhausted. Proschan 
develops the properties of Polya type distributions to establish the optimality 
of solutions to the nonlinear programming problem of allocating spares among 
the different subsystems under a budgetary constraint. 

The system discussed in this paper also has an alternative statement in terms 
of queuing theory. The positions of the system’s components may be thought of 
as the n servers of a single queue. The totality of n + k original components 
and assigned spares constitute waiting members of the queue, n of whom will be 
served at once, while the remaining k will successively take the place of any 
member who has been served. The random variable component life is equal to 
serving time. Total system life is then the time from the start of service until 
that point when some server finds himself without a waiting member to serve. 


2. Systems with Two Components. It is convenient to distinguish between the 
two components of the system which are in actual use at any one time and the 
two positions of these components. In view of the pictorial representation of the 
possible failure configurations of the system, as displayed in Fig. 2.1, sequences 
of components that successively replace each other will be called “arms” of the 
system. That sequence which terminates in final failure on the interval 
(L, L + dL) (henceforth taken as “L’’) will be designated as “Arm 1.” Cor- 
responding outcomes arise when the arms are reversed. Fig. 2.1 shows all modes 
of failure of a system of two components and four spares under this definition 
of Arm 1. 

Quite generally, the various ways in which the system can fail at time L may 
be enumerated as follows: 

1. The first component in one arm has life L, while all k spares replace the 
initial elements in the other arm and each other in succession. 

2. One arm has a single failure replaced by one spare, followed by failure at 
time L, while the remaining k — 1 spares replace failures stemming from the 
first component in the other arm. 


(k + 1). The original component of one arm has life in excess of L, while k 
spares replace the initial component and themselves on the other arm, until 
final failure at life L. The probability of the (i + 1)th of these out- 





DONALD F. MORRISON AND H. A. DAVID 


OUTCOME 


TIME 
Fig. 2.1 Possible Outcomes of L(2, 4) 


comes, i = 0, --- , k, in which 7 + 1 failures fall on Arm 1 and k — i failures on 
Arm 2 prior to time JL, is 


(2.1) 2Pr(iLsut+::+ +245 L+ dL) 


‘Priv tees tyes SL Sy tess + Yo-iss). 
The factor 2 arises from the fact that either arm may be designated as “‘first.” 
To avoid confusion with the lives in Arm 1, those in Arm 2 are written “‘y; .” 
The first probability is equal to f;,:(Z) dL, the density of the convolution of 
i + 1 independent variates, each distributed according to f(z). The second can 
be written as Fy_(L) — Fy-is:(L), where F;(L) = Jv f;(u) du. If these differ- 
ences of cumulative distributions are abbreviated as P;(L) = F,(L) — Fis(L), 
the required density of L(2, k) is 


(2.2) p(L) = 220 fiss(L)Pr(L), O0sL<~. 


p(L) and EL(2, k) will now be evaluated for certain component life densities. 
The density of L(n, k) in the case f(z) = e * can be derived from the properties 
of the exponential distribution without recourse to the previous arguments; 
it is well known to be 
(2.3) p(L) = (k!)~'n**L*exp(— nL), 0<L<-, 
For f(z) a gamma distribution with integer power parameter m and unit scale 
parameter, 

f(z) = (m!)"2"e™, 0Os2r< a, 


f(L) = (im +i -—1)CL'"* ee’, 0OsL<@, 
im+i-—l 


FL) 1-—e¢” a Li/j\, m an integer, 


=) 


PL) = “hk LitK"*” 115 + i(m + 1)]!. 
}= 





RELIABILITY WITH SPARE COMPONENTS 


Inserting these in (2.2) gives 


m 1 I+k m+) +m 
4 


. 
25 (L = Qe?” “i 
(2.5) p(L) = 2 2 & fim +) +m) !|j + (k — i)(m + 1)!’ 


while the rth moment of total life is 


k m . — jk m+1)—m—r 
: ne [j + k(m + 1) +m + rj!207*" 
y sha ( C => - . ‘ 
(26) BL" (2k) = 2 ie + 1) Fm + E = De FDI 


Theoretical and empirical justifications for the gamma life density may be found 
in [1], [2], and {6}. 

Tables 2.1 and 2.2 contain values of EL(2, k) for certain m and k. Except 
for small m, no simpler expression for the mean appears to be available. If m = 1, 
or component life a $x; variate, the sums of (2.6) are the odd-index terms in 
the binomial expansion of (} + 4)" with n = 2k + 1, 2k + 2, respectively, and 
it follows that 


(2.7) EL(2,k) = k + 5/4. 


Cox [4] has also obtained this expectation by way of renewal theory. 
We shall define the relative advantage of a system with k spares over one with 
none as 


(2.8) a(n, k) = nEL(n, k)/[(n + k)EL(n, 0)}. 


From the density of the (k + 1)th order statistic u = x4%4;),) in a sample of n 
independent and identically distributed random variables, viz., 


(2.9) gu) = n(" a — F(u)]"*" F*(u)f(u), 


it follows for the gamma population (2.4) and k = 0,n = 2, that u = L(2, 0) 
has density 


(2.10) g(u) = 2(m!)*u"e™ >> u’/j}!, 0s u< ~, 
=—0 


? 


with rth moment 


(2.11) Ew’ = (m!)72-"" > (m+ r+ j)2/jl. 


3=0 


Since the zeroth moment of any random variable must be unity, 
1 = (m!)'2"2 (m + j)12°7/j1, 
j=0 
so that for r = 1, (2.11) may be summed to give 


Ezqyy = m+ 1 — 2°""*(2m + 1)!/(m!)’, 
~ m +1 — 4(2m + 1)/(m)!. 


(2.12) 





1088 DONALD F. MORRISON AND H. A. DAVID 


Likewise, 
(2.13) Exe ~ m + 1 + 4(2m + 1)/(4m)!. 


The values of that quantity in Table 2.1 reflect the approach of EL(2, 1) to the 
mean of the second order statistic. 
The evaluation of (2.2) and subsequent computation of EL(2, k) for the 


TABLE 2.1 


Expected Total Life, EL(2, 1), Relative Advantage, a(2, 1), and Expectations of Related Order 
Statistics: 2 Components, 1 Spare. 


f(z) = (m!)"'z"e-™ 


EL(2, 1) a(2, 1) | Exain 


000 33 0.500 
2.250 .20 1.250 
3.492 13 2.062 

711 2.906 

906 04 3.770 

081 4.646 

740 9.150 


TABLE 2.2 
Expected Total Life, EL(2, k) and Relative Advantage, a(2, k): 2 Components, k Spares. 


f(z) = (m!)—“'z"e* 








7 4.250 
21 6.500 
16 | 8.748 
13 10.989 
ld 13.220 
07 


EL(2, 4) 
2.500 
5.250 
8.000 
10.750 
13.504 





RELIABILITY WITH SPARE COMPONENTS 1089 


translated exponential life density f(z) = exp[—(z — u)] is slightly more in- 
volved, although the expressions for the expectations are straightforward: 
EL(2,1) = » + 3/2 — 1/2e”", 
EL(2,2) = 26+ 5/4 — 1/2e* + e*(u + 3/4), 
(2.14) EL(2,3) = 2u+ 11/4 — (7/8 + w/4) +0°%/4 — ™/8, 
EL(2,4) = 3u + 33/16 + € "(7/8 + u/4) — €(23/32 + 9u/16) 
+ &™*/8 + e(5/32 + 5u/16). 

3. The Distribution of L(n, k). Although it is possible to obtain the density 
function of L(n, k) by generalizing the argument of Section 2 to n arms, it will 
be of greater utility to find instead the cumulative distribution function of system 
life in the guise of syslem reliability, or the probability that system life will 
exceed zx units of time. Where switching from failed to spare components is 
instantaneous and perfect, the notion of reliability provides a measure of the 
advantage of carrying a certain number of spares. Furthermore, the expected 
system life is merely the integral of the reliability over all positive z. 

We shall write the reliability function of a system with n components and k 
spares as R, (xz) = Pr(L(n, k) > z). In the sequel, it will often be convenient 
to abbreviate reliability, the P;(2) of Section 2, and the cumulative distribution 
function of component life as R,, , P; , and F, respectively. It is well known that 
R.o = (1 — F)*. Now a system with a single spare component can operate 
throughout the interval (0, xz) in either of two ways: 

(1) The original n components function without failure. 

(2) n — 1 of the original components and the single spare function without 
failure. 

The probability of the first of these disjoint events is of course R, (x). In 
the second, it will be helpful to visualize an extension of the arm concept of 
Section 2 to n arms. The probability that the length of the arm with the single 
failure exceeds z is P;(x) = F(x) — F(x); obviously the probability that the 
remaining n — 1 arms each has life greater than z is R,_, (2). Since the arm 
with the single failure can be chosen in n ways, the probability of the second 
event is nR,1,0(z)Pi(xz). Hence, 


(3.1) Raa = Rao + nR,-+10P; e 


The reliability of a system with k + 1 spares is likewise expressible as the 
reliability of one with k spares, plus the probabilities of the ways in which ex- 
actly k + 1 elements can fail. Each of these probabilities is associated with a 
partition of the integer k + 1, and is a product of the number of ways that par- 
ticular partition may be assigned to the n arms of the system and certain terms 
in R.. and P, . Since the partitions of k = 2 are 1’, 2, the reliability of a system 
with the two spares is 


(3.2) Ri» = Raa + nRi1» P2 + (3) R.-20 Pi. 





1090 DONALD F. MORRISON AND H. A. DAVID 
Similarly, 


(3.3) R,.3 = 2 « ) R,, 3,0 Pr; + 2 (D)Ra-20 P; P, + nR,, 1,0 P; ’ 


_ 


Ris = Ris + (‘) R, 4,0 P} + n e < ') R,-3.0 Pi P, 
(3.4) 


— (5) 2,0 P3 + 2 (°)e. 2,0 P, P, oa nR,, ~1,0 P, ° 


Note that the affixes of the P-symbols give the various partitions of k = 3, 4. 
Additional reliabilities for k = 5(1)8 are given in the technical report [10]. 

Analytic treatment of these expressions for system reliability seems feasible 
for only the simplest f(r). It can be shown [10] for f(z) = ze™*, 2 components, 
k spares, that 


Reox(x) — Rexr(xz) = 43{(2k) Pe (2r)™ + [(2k + 1) Je (22)""" 

+ 4[(2k + 2) "e™*(22)"™. 
Charts of system reliability may be found in [10] for small n, k, and component 
densities f° (x) = xe“, f(x) = (3!) 72*e™. 

Since the expectation of the random variable with cumulative distribution 
function F(z) is merely Ex = ff (1 — F(x)) dz, it will be convenient to com- 
pute the expected system lives EL(n, 1), EL(n, 2) in that manner. With the 
aid of the well-known relation for the expected values of order statistics [8], 


(3.5) 


(3.6) Exiissiny = Ex-iny + (°) (1 — F(x))"™* F*(2) dz, 
k} to 


it is possible to express EL(n, 1) and EL(n, 2) conveniently in terms of expecta- 
tions of the first few order statistics and certain integrals in P; and R..: 


le) nf P, Reso dL 
0 
(8.7) 


= Exeajny — nf F, Ryo dL. 
» n 16 2 
aa Exjn) + ( ) [ Pi en -2,0 dL 
2) Jo 
ome n| F; Ry-1.0 dL 
0 
= Ex,3\n) = nf F; Ri-1.0 aL 


— n(n — 1) I F, FR,-2,.0dL 


+ (>) I Ryo Fi dL. 





RELIABILITY WITH SPARE COMPONENTS 


4. The Evaluation of Ein, 1) and EL(n, 2) for Certain f(x). 

a. Exponential Life with Guarantee Period yw. Substitution of f(z) = 
exp[—(z — #)|],u Sz < @, in (3.7) and (3.8) yields, with proper modification 
of the limits of integration, 


(4.1) EL(n,1) = uw + (2n — 1)[n(n — 1)" — exp[— a(n — 1] [n(n — DI", 
EL(n, 2) = » + (3n® — 6n + 2)[n(n — 1)(n — 2)]" 
— exp[—(n — 2)ulin{(n — 1)(n — 2)]" 
— (3n — 2){n*(n — 1)(n — 2)J"} + exp[—(n — 1)yjn™ 
— exp[—2(n — 1)y)[n?(n — 1)]". 


Tables of these expectations and their associated relative advantages may be 
found in [10]. 


b. f(z) = ze”. EL(n, 1), EL(n, 2), and associated order statistic expecta- 
tions can be expressed in terms of a finite sum 


S(n) = | (1 + x)" exp[ — (n — 1)a] dz 
0 
(4.3) al 
= (n—1)!'(n—1)" D> (nm — 1)*7/j!. 
j= 


With the aid of. the recursion relation (3.6), integration of (2.9) gives the ex- 
pectations of the first three order statistics in terms of S(-): 


Ezra a) = S(n +1), 


(4.4) Excin) = nS(n) — (n — L)S(n + 1), 


Ezan) = (5) sin — 1) — n(n — 2)S(n) + (" a ‘sin +1). 


The additional integrals in (3.7) and (3.8) may be evaluated from the expres- 
sions (2.4) for F; and P;,m = 1: 


(45) EL(n,1) = S(n + 1)[3/2 + 1/(3n)] + 1/(3n), 
EL(n, 2) = S(n + 1)[15/8 + 5/(6n) + 1/(18n*) — 2/(15n")] 
+ 3/(4n) + 11/(90n®) — 2/(15n"). 


These quantities and their relative advantages a(n, 1), a(n, 2) have been evalu- 
ated in Table 4.1. 

It is possible to compute approximate values of the expectations derived from 
S(n) for large n. In (4.3), replace (n — 1)! with its Stirling approximate: 


n—l 


(4.7) S(n) ~ (2x/(n — 1))*X exp[—(n — 1)](n — 1)4/j1. 


(4.6) 





1092 DONALD F. MORRISON AND H. A. DAVID 


TABLE 4.1 
Expected Total Life, EL(n, k), Relative Advantage, a(n, k), and Expectations of Related 
Order Statistics: n Componenis, k Spares. 


S(z) = ze* 


ELin, 1) EL(n, 2) a(n, 2)> + Exuin) 


.30 1.250 
45 .963 
55 805 
.702 
66 .629 
574 
.73 531 
yf) 495 
ae 466 
a 441 
80 -420 
81 401 
82 .384 
82 .370 


9 


w 


250 
333 
871 
.588 
395 
254 
- 146 
.060 
.989 
AT .930 
AT 879 
48 836 
.48 797 
48 .764 


2.2 
1.663 
1.357 
! 
1 


w tw 


167 
034 
936 
860 
798 
.748 
11 705 
12 669 
13 637 
14 607 
15 585 


ee) 
eet et 


The expression within the summation represents the probability of n — 1 or 
less occurrences of a Poisson variate with parameter n — 1 Since the stand- 
ardized variate u = (x — n+ 1)/(n — 1)! is distributed as } /(0, 1) for large 
n, the sum may bereplaced by the normal probability integral } — #(—+/n — 1); 
this approaches } as n increases, so that 


S(n) ~ 4(2x/(n — 1))', 
4 


’ 


(4.8) 


Excjny ~ 4(24/n) 


a result which also can be obtained by standard methods. The approximate 

expected values of the next two order statistics follow in turn from this result 

and several applications of the asymptotic expression (n — 1)* ~ (n)' — 
4 , 

3(n)'/n: 


(4.9) 


Ex2\n) ~ 3/2 Exajny ’ 
Ex,ajny ~ 15/8 Exajny - 


These expressions imply that the relative advantages a(n, 1), a(n, 2) approach 
3/2 and 15/8, respectively, as n increases. i 

c. Rayleigh Distribution of Component Life, 1 Spare. The Rayleigh density is 
a particular case of the Weibull failure law, 


(4.10) f(x) = px” exp (—2z”), 
with p = 2. Some integrations by parts in (3.7) yield 


(4.11) EL(n, 1) = 3(x/n)'{1 + n/(2n — 1)], 





RELIABILITY WITH SPARE COMPONENTS 
while 
Exq = b(x/n)', 
(4.12) Ez = h(x) (n)*/n + [n(n — 1)'- (n)*(n — 1))//(n—-—1)} 
~ 4(/n)"[1 + n/(2(n — 1))). 


5. Bounds on EL(n, k). From the way in which replacements are added, it 
follows for k < n that L(n, k) cannot be less than the first order statistic of a 
random sample of size n (all spares have life zero), or larger than the (k + 1)th 
(no spares are available to extend life upon final failure.) Thus, 


(5.1) Exq n) s EL(n, k) 3s ExX(esrjn) ’ k <n. 


The lower bound is of little practical value for k > 2. The upper bound is also 

of greatest utility for small k, but its sharpness improves with increasing n. 
Bounds for more general k that do not require knowledge of the order statistic 

expectations are available from a different direction. We may write 


Lin, k) = [Total life of n + k components — remaining life of n — 1 
survivors]/n 


(5.2) 


TABLE 6.1 
Comparison of EL(2, k) with its Lower Bound (k + 1)n“Ez and Upper Bound (n + k)n™Exz 


S(z) = (m!)"z"e"* 


EL(2, 2) Upper Bound Lower Bound [ EL(2, 4) Upper Bound 


| 
\ 


Lower Bound | 


| 


1.500 | 1.500 | 2.000 2.500 2.500 3.000 
3.000 | 3.250 4.000 5.000 5.250 6.000 
6.000 | 6.762 8.000 10.000 | 10.750 12.000 
9.000 | 10.321 | 12.000 | 15.000 | 16.262 | 18.000 


TABLE 5.2 
Comparison of EL(2, k) with the Lower Bound Euan for Selected Component Life 
Densities f(z) 
fix) = «* Siz) = ae* 


Lower Bound Ewin | EL(2, b) 


| Lower Bound Euan 
1.250 1.500 2.906 
2.062 2.500 4.646 
3.770 4.500 8.214 





1094 DONALD F. MORRISON AND H. A. DAVID 


Assuming non-negative wear, ) 7a r; = 0, E “2a r; Ss (n — 1) Ex, so that 


(5.3) (k + 1)n" Ex s EL(n,k) S (n+ k)n™ Ex. 


The lower bound is exact for exponential life. Some indication of the closeness 
of these bounds for certain gamma densities is given in Table 5.1. 

For a system of two components and an even number k of spares, it is possible 
to show that 


(5.4) EL(2,k) = Euayy, 


where the variate u is the convolution of 4k + 1 independent random variables, 
each with density f(z). This lower bound is equal to the expected life of a system 
wherein each original component position is preassigned $k replacements for 
its sole use. Relative to EL(2, k), this bound appears to become more precise 
as f(x) departs from exponential form, and as k increases. Table 5.2 compares 
EL(2, k) with this lower bound for certain f(z). 

6. Acknowledgments. We are indebted to Mrs. Ann T. Randall for computa- 
tional assistance and to the referee for helpful comments regarding the presenta- 
tion of this paper. 


REFERENCES 

{1} Marcus A. AcnEson, Electron Tube Life and Reliability-Chapter III: ‘‘The Forms of 
Life Curves,’’ Sylvania Electric Products, New York, 1956. 

[2] Z. W. Binnspaum anv 8. C. Saunpers, “‘A statistical model for lifelength of materials,”’ 
J. Amer. Stat. Assn., Vol. 53 (1958), pp. 151-160. 

|3] Guy Buiack aNp Frank Proscuan, ‘‘On optimal redundancy,’’ Operations Research, 
Vol. 7 (1959), pp. 581-588. 

[4] D. R. Cox, ‘‘A renewal problem with bulk ordering of components,” J. Roy. Stat. Soc., 
Vol. 21 (1959), pp. 180-189. 

5) D. R. Cox anp Water L. Smits, “On the superposition of renewal processes,’ 
Biometrika, Vol. 41 (1954), pp. 91-99. 

16] BENJAMIN EpsTEIN, ‘‘Stochastic models for the length of life,’’ Proc. Stat. Tech. in 
Missile Evaluation Symposium, Virginia Polytechnic Institute, Blacksburg, Va., 
1958, pp. 69-84. 

7) B. J. Fienineer anv P. A. Lewis, ‘“Two-parameter lifetime distributions for relia- 
bility studies of renewal processes,’’ JBM J. Res. and Dev., Vol. 3 (1959), pp. 
58-73. 

{8} E. J. GumBe., Statistics of Extremes, Columbia University Press, New York, 1958. 

9] Hansras Gupta, C. E. Gwytner, ano J. C. P. Mitier, Tables of Partitions: Roy. Soc. 
Math. Tables, Vol. 4, Cambridge University Press, Cambridge, 1958. 

{10} DonaLp F. Morrison anv H. A. Davin, Life Distribution and Reliability of a System 
with Spare Components, Technical Report No. 45, Dept. of Statistics and Sta- 
tistical Lab., Virginia Polytechnic Institute, Blacksburg, Va., 1959. 

[11] Franx Proscuan, Polya Type Distributions in Renewal Theory, with an Application to 
an Inventory Problem, Unpublished Ph.D. Dissertation, Department of Sta- 
tistics, Stanford University, May, 1959. 





MAXIMIZING THE PROBABILITY THAT ADJACENT ORDER STATISTICS 
OF SAMPLES FROM SEVERAL POPULATIONS FORM 
OVERLAPPING INTERVALS' 


Ricuarp Conn, Freperick Mostre.tier, Joun W. Pratt anp 
Maurice TaTsvoKa 


Rutgers University, Harvard University, Harvard University and University of 
Hawaii 

1. Summary. Let samples of size n be drawn from each of k univariate con- 
tinuous cumulative distribution functions on the same real line, and consider the 
intersection of the k intervals between the rth and (r + 1)st order statistics in 
the several samples. Then, to maximize the probability that that intersection be 
nonempty the distributions should be identical. Furthermore, for each sample, 
consider two intervals—that between the rth and (r + 1)st and that between the 
sth and (s + 1)st order statistics—then to maximize the probability that both 
the intersection of the ‘“‘r’”’ intervals and the intersection of the “s’’ intervals be 
nonempty, the distributions again should be identical and the value of the maxi- 


mum probability is 

Eite a} 

r/\s-r 
(i )( = a) — 

kr J\kls — rj 
Some possible directions for generalization are discussed. The problem arose in 
connection with a sociological study of interaction behavior in small groups. The 
results make it possible to provide a test of the hypothesis that several samples 
of the same size are randomly drawn from possibly different populations, against 
the alternative that the samples are not independently and randomly drawn from 
distributions. 

For example, suppose we observe the frequency of a particular sort of inter- 
action for each member of five groups of size six. Suppose the five men with the 
highest frequencies each belong to a different group. Then we can say (ignoring 
discreteness) that an event has occurred whose probability under random sam- 
pling is at most 144/2639 or about 0.055. (The statistic would have but two 
values, either the five highest belong to different groups, or they do not. Such a 
test would be especially appropriate if group structure were thought to develop 


Received October 20, 1958, revised February 23, 1960. 

! This work was facilitated by a grant from The Ford Foundation and by the Laboratory 
of Social Relations of Harvard University; it was supported in part by the Rutgers Re- 
search Council, in part by a research grant M-926 from the National Institute of Mental 
Health of the National Institutes of Health, Public Health Service, and in part by grant 
G-5554 from the National Science Foundation. 


1095 





1096 COHN, MOSTELLER, PRATT AND TATSUOKA 


automatically certain specialized functions in members.) However, the main 
interest in this paper is in the problem in probability. 


2. Introduction. Suppose that we have a sample of n observations from each 
of k one-dimensional populations with arbitrary continuous distributions. Let a, 
denote the largest, b; the second largest observation in the ith sample, 1 S i S k. 


We consider the probability P(min a; 2 max b,). In particular we wish to know 
what relation must hold among the distributions of the k populations for this 
probability to be maximized. Alternative ways of describing this probability are 
these: (a) Let the k largest observations in the entire collection of kn observations 
be chosen. What is the probability that each of these is the largest in its own 
sample (i.e., is an a,)? (b) Consider the & intervals formed by the largest and 
second largest observations in each sample. What is the probability that no two 
of these intervals fail to overlap? 

It seems intuitively reasonable that the maximization is achieved when the k 
distributions are identical, and so it turns out to be, as we prove later. The prop- 
erty concerned is one which states a similarity among the k samples. It is reason- 
able to expect the samples to have the highest probability of being similar when 
the populations from which they come are identical. While it is possible that there 
is a corresponding theorem for multivariate distributions, dealing with overlaps 
of statistically equivalent blocks [6], the examples of Section 6 suggest that the 
generalization is not immediate. 

This general probability question grew out of a study of sociological data. Prof. 


Matilda Riley, of the Department of Sociology of Rutgers University, was inter- 
ested in ways of formulating mathematically the notion of social structure. One 
specific problem [5] was to show that the interaction scores of members of several 
groups were not random collections of scores drawn from single populations (one 
for each group). The results of the present paper yield one way of studying such a 
question for groups of the same size. 


3. Derivation of the probability expression. Let us denote the k cumulative 
distribution functions by y:(7), ye(a), --- , ye(a). We assume these functions 
are continuous; the reader may assume they are differentiable, but our demon- 
strations are valid even if they are not. 

Let the least of the largest observations in the k samples be u. This may, of 
course, be the largest of the n observations from y;(z), or the largest of the n 
observations from y2(x), and so forth through y,(z). That is, « = min a; where 
a; denotes the largest observation in the sample from y;(z). 

Consider as typical the case in which the sample from y;(z) contains the 
least of the largest observations; that is, the largest observation a, in the sample 
from y;(x) has the value u. Then, the probability that this occurs and that no 
second-largest observation b; in any sample exceeds u may be found by computing 
this probability for fixed u and then integrating. We use differential language 
for convenience and first find: 





ORDER STATISTIC INTERVAL OVERLAPS 


Prob {lst sample has n — 1 numbers not 
exceeding u, 1 number in the interval 
(u, u + du), and no number greater than 
u + du, and that the 2nd, 3rd, --- , kth 
samples each have n — 1 numbers not ex- 
ceeding u and 1 number greater than uw}. 


Since the samples are independently drawn from the k populations, this proba- 
bility can be written as the product of the following probabilities of the k events 
described above in the curly brackets: 


' 


mn _ (y(n) J" [dys(u) 1 — y(n)? = nyt dys, 


(n — 1)!1!0! 


n. 


i Dit [yo(u)]" [1 — yo(u)) = myz (1 — ye) 


a fye(u)}” [1 — ye(u)] = nyt "(C1 — ye), 
(n — 1)!1! 
where dy,(u) = ys(u + du) — yu) and, in the simplified expression for each 
probability, the argument u has been omitted as understood. 
Thus, the probability that the largest observation of the first sample be the 
smallest of the largest, and that none of the second-largest observations exceeds 
u, is given by 


k 
(1). n | TI yy (1 — w) |x an, 

j=2 
for any particular value u. Hence, the probability that the above event should 
happen with some value of u is the integral of the expression (1) over u from 
—« to +; that is, 


+3 k 
(2) nf | I yy a w) |x an, 
— 90 j=2 


where it is understood that each y; has the argument u and the limits — ~ 
and + are those for u.” If y, is differentiable, the integral [72 --- dy, is 
ft= --- (dy:/du)du; if y, is not differentiable, the integral is construed in the 
Stieltjes sense. 

Exactly similar considerations hold for the cases in which the 2nd, 3rd, --- , 


? The argument of this paragraph may be made rigorous as follows. The sum of (1) over 
a set of mutually exclusive and exhaustive intervals (u, u + du| approaches (2) as the 
lengths of the intervals approach 0. Now this sum is the probability that, for some one of 
the intervals (u, u + du], each sample has exactly one number greater than u and the first 
sample has no number greater than u + du; hence (2) is the limit of this probability, which 


is the probability that a; = min a; > max b; , since the latter event is the limit of the 
former 





1098 COHN, MOSTELLER, PRATT AND TATSUOKA 


kth samples each in turn contains the smallest of the largest observations, and 
they yield the probability expressions, 


Eo é 
nf [oc “ w) |e dys, 
o Ls? 


«© fk—1 
nf va - w) |x ' dye, 
* 3 


respectively. The []y}~'(1 — y,) ineach case lacks the factor y7~'(1 — y;) for 
the particular value 7 that coincides with the subscript of the differential, dy; , 
being integrated. 

Now the event whose probability we want is the union of the k events de- 
scribed in the foregoing, since we do not care which sample happens to contain 
the smallest of the largest observations. And since these k events are mutually 
exclusive, the desired probability is the sum of the k separate probabilities. We 
denote this by P, ; thus 


~“ k k k 
(4) P, = n | | 1 =) (1 - w) | av, 


m= tel 7% 


where again the integration is over u. 


4. Maximizing the probability. We now address ourselves to our main question: 
What relationship should hold among the & distributions y; in order that the 
probability P, shall be as large as possible? If, as suggested in the introduction, 
the maximum FP, is attained when all k distributions are _ identical: 
Yi = Yo = -** = ye = y, then P, reduces to 


(5) kn' / (1 a y)* yk 1) dy. 


Here the integration may be actually performed over y (instead of the under- 
lying variable u). The limits of y are 0 and 1 and the integral is a complete 


, eed ky 
beta-function. The value of expression (5) is nt / ( k 


‘) (This result can alter- 


natively be obtained by a combinatorial argument. ) 
We now prove that P, is maximized by the relation y = yo = --- = 4%. 
We define 


k 
(6) H(u) = [[ [1 — ys(u)}. 
i=1 


Motivation for the introduction of H comes from noticing that dH (see below ) 
is, except for sign, the last part of the expression under the integral sign in equa- 
tion (4). H(u) may be interpreted as the probability that k observations, one 
from each distribution, exceed u, or 1 — H(u) may be interpreted as the cumula- 
tive distribution function of the minimum of k such observations. By two ap- 





ORDER STATISTIC INTERVAL OVERLAPS 1099 


plications of the well-known inequality between the arithmetic and the geo- 
metric mean we find* 


II v = It -(1 -wisft -7y 01 -w] 


t= tl 
\4 k k 
< E -J]a- ¥™ | = (1 —H")*. 
t= 


From definition (6), we see that the total differential of H(u) is 


dH = ole I (1 - w) av = ->[ (1 - w) | du. 


t=l ij= tl iT 


Substituting in (4) we find 
(8) P.sn' [ (a — HY)" (—1) dH, 


where again the integration is over u. Putting y(u) = 1 — H"*, so that H = 
(1 — y)*, the right-hand side of (8) reduces to (5). This completes the proof. 

The proof just given was suggested by the referee. It replaces a longer proof 
that routinely applied the method of Lagrange multipliers. Our original proofs 
of Theorems 1 and 2 below also used Lagrange multipliers, but, at the sugges- 
tion of the referee we developed proofs that are more in the spirit of the above 
method. 

We now prove that P, is maximized only where all y; are equal. Note that, 
for any given set of y, , (6) defines u as a function of H. The y; are single-valued, 
continuous, monotonically decreasing functions of H except possibly at H = 0, 
even though u is not. For H = 0, let y; equal its limit as H decreases to 0. Since 
there is no contribution to (4) when H = 0, for then at least one y, is 1, it 
may be rewritten as 


lk 
| [I v3~ a4. 


j=mt 


The right-hand side of (8) becomes 


ni [ Pe Hye dH. 


Both integrations are now over H. 
* At the request of the editor we note that the inequality between the first and fourth 
member could have been rewritten as an inequality expressing the convexity of the func- 
tion [ 1 y;}'”: 
(TT wt® + (IT a — wor 1. 
That inequality is a special case of what Hardy, Littlewood, and Pélya (2] call the Hélder 


inequality, and they attribute the quite special form UT ai} * + TT B}'*s (IT (a; + B,)}'* 
to Minkowski in 1896. We refer to the current reprint [4] of that work. 





1100 COHN, MOSTELLER, PRATT AND TATSUOKA 


The difference of the two integrands is continuous and, by (7), nonnegative. 
Since [1] the integral of a continuous nonnegative function vanishes only if the 
function is identically 0, the two integrals can be equal only if the integrands 
are. But the inequalities in (7) are strict unless all y; are equal. This shows that 
if P, is maximized, the y; must be all equal except possibly at H = 0. But then 
they must all approach 1 as H decreases to 0, and hence they must be equal 
to 1 when H = 0. 


5. Two generalizations. The result obtained in the preceding section is extended 
in two ways in this section. The first extension concerns the maximization of the 
probability of overlap of the k intervals formed by the rth and (r + 1)st order 
statistics of the k samples. The result, as before, is that all distributions should 
be identical, and we have removed the restriction r = 1, imposed in Section 4. 
We sketch the proof. 

Denoting the rth observation (in descending order of magnitude) in the ith 
sample by z,;, the probability to be maximized turns out to be given by 


Prob {min z,; > max 2,41,;} 
. + 


» 2 2 k 
= r(”) =T via - w)'] yt (1 — ys)” dys. 
r feel So | iH 


Defining H(u) and y(u) as in Section 4, we find that this probability is 
bounded by 


n k * ~ H 
°(”) [ (1 aia ey dy; 


t—1 1 = Yi 


k 1 
= r(”) I (1 = A)!" A aH. 
, 0 


The last expression in (10) is the probability given in (9) when y, = ye = --- = 
ye = y. Hence, this is again the maximizing condition. We shall not take space 
for a uniqueness proof similar to that at the close of Section 4. 

In considering the intervals [z,,; , z,],r = 1,2,--- ,n — 1, included between 
the adjacent order statistics z,,, and z,, it is convenient to extend the set of 
intervals to include [x, 4, , 2,| and [x , zo], where z,,, and 2» are interpreted as 
—« and + respectively. The whole set of intervals then forms the basic 
statistically equivalent blocks for a univariate distribution. If we substitute 
r = O orr = n in expression (11) below for the minimum probability, we get 
unity in each case, as we should. 

We summarize the foregoing in 

THEoreEM 1. Jf samples of size n are drawn from each of k continuous distribu- 
tion functions y;(x), i = 1, 2, +--+ , k, the probability that 


(9) 


(10) 


min z,; > Max 2-41.; 
‘ ‘ 





ORDER STATISTIC INTERVAL OVERLAPS 1101 


for a given r (= 1,2, --- ,m — 1) is maximized if and only if y: = y2 = ---= He, 
and the maximum value is 


a (Y/(e) 


As our second extension, we consider a more general situation in which we 
pay attention to two corresponding intervals from each sample, say the interval 
formed by the rth and (r + 1)st order statistics and that formed by the sth 
and (s + 1)st order statistics. What is the condition for maximizing the proba- 
bility that the ‘“r” intervals overlap and also the “s” intervals overlap? The 
answer turns out to be the same as before: the k distribution functions should 
be identical. We sketch the demonstration. 

It can be shown that the probability in question (letting s > r) is given by 


Prob {min z,; > max Z,4:,, and min z,; > max 2,4:,;} 
4 ‘ ‘ q 


(12) 


lal 
lys(u) — y,(v))[1 — yu)} 


where u = min; z,; and v = min; z,; and 


‘ n! A 
AP seme Lae tesm: 


-4byd 


inl jul 


- « LD ele)” Tyn(u) — wee) 0 — yy 
fan 


a. dy;(v), 


We plan to integrate first with respect to v. To facilitate this we pull out 
from the integrand factors involving u alone, and we factor out enough powers 
of y:(u) so that every y:(v) and dy;(v) may have a matching denominator of 
y:(u). We then substitute z;(v) = y:(v)/y:(u), noting that z;(v) can itself be 
regarded as a cumulative distribution. After these substitutions we have as the 
integral with respect to v 


(13) [ II z(v)|""* II (1 — 2(v))]"” > dz(v)/(1 — 2,;(v)). 


By the methods used earlier, the maximum of this expression is readily found 


to be i/|o — nie rm | Then aside from constants we are left with 


the expression 


(14) [ TI v(u)"” TT Gl — wlu))” SS dyu)/ — ylu)), 


whose maximum is readily found from equations (10) and (11) to be 


'/{(e)} 


The constant A of equation (12) and the two numerical results just obtained, 





1102 COHN, MOSTELLER, PRATT AND TATSUOKA 


when multiplied together, give the bound shown below in Theorem 2. It is easy 
to verify that this upper bound is achieved by the right-hand side of equation 
(12) when all the y,;(~) are identical. Again we omit a uniqueness argument like 
that at the close of Section 4. We state the conclusion as 

TueoreM 2. If samples of size n are drawn from each of k continuous distribu- 
tion functions y(x),i = 1, 2,---, k, the probability that 


min z,; > Max 2y41.; 
‘ i 


and 


min z,; > Max 2,41,; 
i ‘ 


for a given r(= 1, 2,---,n — 1) and a given 8(=r,r + 1,---,n — 1), is 
maximized if and only if y; = yo = +--+ = Ye, and the maximum value is 
Cuew 

re r s-—r 
”) a (ts = ny) 

kr ] \kls — rj 

A degenerate case occurs when s = r, because then only one interval is denoted, 
and then expression (15) should reduce to expression (11), as it does. 

Theorem 2 extends to three or more corresponding intervals, and the prob- 
ability in the case of identical distributions is the obvious extension of expres- 
sions (11) and (15). The following alternative proof generalizes directly. The 
probability in Theorem 2 is the probability P that min, z,; > max; Z,4,,; times the 
conditional probability Q that min, z,, > max; 2,41,; given min; z,; > MAX; Tr41,; - 
The probability P is maximized, by Theorem 1, if and only if y: = yo = --* = y%. 
Theorem 2 follows if we show this condition also maximizes Q. Given that 
u = min; 2; > max; 2,4;,;, there are exactly n — r observations below wu in 
each sample. Thus, conditionally, z,4:,;, --- , %a,« are the order statistics of a 
sample of n — r from the conditional distribution y,;(z)/y(u), 2 < u. By 


Theorem 1, the conditional probability Q is maximized if y, = yo = --- = y%, 
which is all that remained to be shown. 


6. Examples in two dimensions. The generalization of our results to higher 
dimensions is not immediate. Examples show that, even though the samples are 
drawn from the same distribution and statistically equivalent blocks are con- 
structed for each sample by the same program, the probability of overlap depends 
on both the common distribution and the common program. 

Even in one dimension, the probability may depend on the program: if the 
first block in each of /& samples is the interval between the rth and (r + 1)st order 
statistics, the probability that the inter-ection of the & first blocks is nonempty 
depends on r, by equation (11) of Theorem 1. 

Here the dependence of the probability on the program can be removed by re- 





ORDER STATISTIC INTERVAL OVERLAPS 1103 


numbering the blocks correctly. In the following example, this dependence is per- 
haps even clearer, since it cannot be removed by renumbering blocks. 

Examp.e 1. From any continuous two-dimensional cumulative, take two sam- 
ples of size two. Program 1. For each sample, construct statistically equivalent 
blocks by using as boundary functions the vertical lines through the two sample 
points, thus obtaining three regions in the plane: the right, the left, and the © 
middle. The right-hand regions of the two samples always overlap, as do the left- 
hand regions, but the middle regions overlap with probability 2/3 because the 
construction program essentially reduces the problem to one dimension and equa- 
tion (11) applies with n = 2,k = 2,r = 1. 

Program 2. In each sample, use the vertical line through the right-hand sample 
point for the first boundary function, and, for the second, the horizontal half-line 
running from —« through the left-hand sample point to the first vertical line. 
This program produces three regions for each sample: the right-hand region, the 
upper-left, and the lower-left. Each region always overlaps its mate in the other 
sample. 

Thus we have shown that, even when samples are drawn from the same distri- 
bution, different methods of constructing statistically equivalent blocks can pro- 
duce different probabilities of overlap, here 2/3 for one region and 1 for two regions 
of Program 1 and 1 for all regions of Program 2. 

In this example, for neither program did the probability of overlap depend on 
the distribution sampled. The following example shows this too is possible. 

EXaMPLe 2. Again draw two samples of size 2. 

Program. For each sample the first boundary is a circle about the origin through 
the sample point farther from the origin; the second boundary is the chord of this 
circle passing horizontally through the remaining sample point. Now there are 
three regions, the one exterior to the circle, and the two interior to the circle, say, 
the upper-interior and the lower-interior. 

Let the samples be drawn from a uniform distribution over a circle of radius 1 
with center at a point (c, 0) on the horizontal axis. If c is large, then the upper 
region is practically the upper half of a large circle about the origin, and such 
regions are sure to overlap from sample to sample. (Actually, they are sure to 
overlap near (0, 1) if |c¢| > 2.) On the other hand, if c is 0, then both points of 
one sample may fall within one half unit of the origin while both points in the 
other sample fall at least one half unit above the horizontal axis. Then the upper- 
interior regions do not overlap. Thus without further calculation we can say that 
the probability of non-overlap is strictly positive for c = 0 and zero for large c. 

All told then we have shown that, for the two-dimensional problem, even if two 
samples are drawn from the same distribution and statistically equivalent blocks 
are constructed in each sample by the same program, the probability of overlap 
may vary from program to program for a fixed distribution, and from distribution 
to distribution for a fixed program. It is suggestive that in Example 1 the smaller 
probability of overlap is associated with the “one-dimensional” Program 1. 

One generalization of Theorems 1 and 2 to several dimensions is readily pre- 





1104 COHN, MOSTELLER, PRATT AND TATSUOKA 


sented, but it succeeds essentially by reducing several dimensions to one. In fact, 
these theorems apply immediately to statistically equivalent blocks defined by the 
order statistics of a real-valued function f of the observations. (That is, the first 
block is the region where f(z) exceeds f(X,;) for every member X,; of the sample, 
the second block is the region where f(x) exceeds f(X ;) for every X; but one, etc.) 
The sample may be drawn from any probability distribution on any set whatever, 
as long as f is measurable and f(X,;) has a continuous c.df. (The distribution must 
be non-atomic or no f(X,) has a continuous c.d_f.). 


7. Remarks. In connection with another problem, Lehmann ([3], p. 172, 
Lemma 4.1) proves what amounts to the special case of Theorem 1 with n = 2, 
k=2,r=1. 

Several possible generalizations besides those already considered suggest them- 
selves. For example, Theorem 1 concerns a set of k corresponding intervals, one 
interval for each of k samples, and states that the probability that all k intervals 
overlap is maximized when the distributions sampled are identical. Is the same 
true of the probability that at least m of these k intervals overlap? Theorem 2 
states that the probability that overlap occurs in both of two sets of corresponding 
intervals is also maximized when the distributions are identical. Is the same true 
of the probability that overlap occurs in at least one of the two sets of intervals? 

The probability that non-corresponding intervals overlap is not necessarily 
maximized when the distributions are identical. For example, for one sample of 
one, take the interval from the observation to +; for another sample of one, 
take the interval from the observation to — «. These intervals overlap if and only 
if the second observation exceeds the first. This has probability 1/2 if both obser- 
vations have the same distribution, but may be larger (even 1) if the observations 
have different distributions. 

Some work is being done on problems of samples of unequal sizes. 

Finally, of course, all such questions can be raised about statistically equivalent 
blocks. 

We wish to express our appreciation to William Kruskal for numerous sugges- 
tions. 


REFERENCES 


{1] G. H. Harpy, A Course of Pure Mathematics, 10th Edition, Cambridge University Press, 
Cambridge, 1955, p. 340. 

(2} G. H. Harpy, J. E. Lirrtewoop, anno G. Péiya, Inequalities, Cambridge University 
Press, Cambridge, 1934, p. 21. 

{3} E. L. Leumann, “‘Consistency and unbiasedness of certain non-parametric tests,’’ Ann. 
Math. Stat., Vol. 22 (1951), pp. 172-173. 

[4] Herman Minxowskt, Geometrie der Zahlen, Chelsea Publishing Co., New York, 1953, 
p. 117, footnote. 

[5] M. W. Rivey, J. W. River, anp Jackson Topsy, Sociological Studies in Scale Analysis, 
Rutgers University Press, 1954, pp. 25-26. 

[6] J. W. Tuxey, ‘‘Non-parametric estimation II. Statistically equivalent blocks and 
tolerance regions—the continuous case,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 
529-539. 





NORMALIZING THE NONCENTRAL i AND F DISTRIBUTIONS 
By Nico F. LavusscHer 


South African Council for Scientific and Industrial Research and 
Cornell University 


1. Introduction and Summary. Let X be a random variable governed by 
one of a family of distributions which is conveniently parameterized by y, the 
expectation of X, so that, in particular, the variance of X, o’, is a function of yu, 
which we denote by o’(u). A transformation, ¥(X), is sometimes sought so that 
the variance of y(X), as u sweeps over its domain, is independent of wu (or much , 
more nearly constant than o°()). 

A standard method of obtaining such a transformatiun for stabilization of the 
variance is to consider X as one of a sequence of random variables, the sequence 
converging asymptotically in distribution, usually to a normal distribution. 
One form of the basic theorem is stated and proved by C. R. Rao [8], pp. 207-8, 
as follows. 

TueoreM (Rao). Jf X is asymptotically normally distributed about yp, with 
asymptotic variance o°{u), then any function py = ¥(X), with continuous first 
derivative in some neighborhood of yw, is asymptotically normally distributed with 
mean ¥(u) and variance o*(y)(dy/du)*, where (dW/du) denotes the derivative of 
¥(X) with respect to X, evaluated at the point u. 

From this we immediately have the following well-known 

Coro.iary. The random variable 


x 

(1) WX)=c] -*, 
«x o(u) 
where 0 < x < «, and where K is an arbitrary constant, has a variance which is 
stabilized asymptotically at c’. 

It is assumed, of course, that the integrand in (1) is integrable. If y( X) is not 
a real-valued function on the domain of X, then the mapping is meaningless. 

Transformations such as (1), perhaps slightly modified, not only often work 
well for stabilizing non-asymptotic variances, but also often serve as well to 
normalize non-normal distributions. In general, however, nothing is known about 
the relative closeness to normality of the distribution of a random variable before 
and after a variance-stabilizing transformation is applied. Nor can anything 
general be said about the relative rapidity of approach to asymptotic normality. 

The study of concrete examples, however, suggests some connection between 
variance stabilization and normalization of non-normal distributions. A theo- 
retical connection that may be relevant in certain cases has been put forward by 
N. L. Johnson [3], pp. 150-1. Johnson shows that, when the random variable of 
interest has a certain structure, then the differential equation for the normalizing 


Received April 15, 1959; revised July 18, 1960 
1105 





1106 NICO F. LAUBSCHER 


transformation is similar to the differential equation for the variance-stabilizing 
transformation. The specified structure is that X¥, = Y; + Y2G(X,) + 

+ Y,G(X,1), where the Y’s are independent and small, and G(-) is some 
function. 

In what follows, we obtain the variance-stabilizing transformation for the 
noncentral ¢ distributions and consider its normalizing properties. We repeat the 
same procedure for the topside noncentral F distributions, although the variance- 
stabilizing transformation in this case is not well-defined. We then derive two 
other (well-defined) transformations for the approximate normalization of the 
topside noncentral F. Numerical comparisons of these approximations and the 
exact values are given. 


2. Noncentral ¢. If U and V are independent random variables and U is N(0, 1), 
V is x’/n (where x’ denotes the central chi-square variable with n degrees of 
freedom), and 6 is a real number, then the random variable defined by ¢t = 
(U + 8)V~ is known as the noncentral ¢ variable with n degrees of freedom and 
with noncentrality parameter 6. We assume throughout that n = 4. The first 
moment about zero, and the second and third central moments of noncentral 
t were obtained by Johnson and Welch [4]. These moments are, respectively, 


u = (4n)'sr(4n — 4)/T(4n), 


wo = [n(1 + &)/(n — 2)] — w’, 


ws = pin(d + 2n — 3)/[(n — 2)(n — 3)] — yy}. 
Eliminating 6 between u and ye we find that 
w=a + by 
where 


a = [n/(n — 2)}' 


b = {21°(4n)/[(n — 2)F*(4n — 4)] — 1)! 


which is a positive real number for n 2 4. 
Now, from (1), (with K = 0 and c = 1), the variance-stabilizing trans- 
formation, ¢(t), of noncentral ¢ is 


t 
E(t) [ (a — , u’) du 
0 


a | 
a sinh (81), 


where a = b' and 8 = b/a. 





NORMALIZING t AND F DISTRIBUTIONS 


The random variable 
(2) f(t) = &(t) — asinh™ (fp) 


will have, approximately, mean value zero and unit variance. 

Using the second— and third—-derivative terms respectively in the Taylor series 
for expectations, the following transformations might be expected to eliminate 
more bias than (2): 


(3) f(t) = &(t) + IP uy’, 
(4) E(t) = f(t) — $b‘us?usl2u* — (a’/d*)). 


To find the degree of approximation to normality of these transformations, 
Plts n'z} is approximated by Pit) s é(n'z)}, where é,(t), 7 = 1, 2, 3, is the 
transformation used. If n and 6 together with a, 0 < a < 1, are specified, then 
the equation Pjt < n'z} = 1 — a determines z uniquely’. Let (-) denote the 
unit normal distribution. If &;(t) is exactly N(0, 1), then #(é,(n'z) ) =l-—a 
is an identity. From tables of the normal distribution, if a = 0.10, it follows that 
£,(n'z) = 1.282,7 = 1, 2, 3. Hence, given n, 6 and a = 0.10, we obtain z from 
the tables of Resnikoff and Lieberman [9], pp. 383-9°. We enter the values of 
t,(n'z) in Table 1. To see how good the approximations are in terms of prob- 
abilities (rather than in terms of the deviates) it suffices to note a few values of 
#(-) for quick reference: (1.26) = 0. 896, (1.27) = 0.898, ® (1.28) = 0.900 
and @ (1.29) = 0.902. 

There is a considerable degree of skewness in the noncentral ¢ distribution for 
large values of 6 and small values of n (Johnson and Welch [4]). Thus we do not 
expect (2) to be a very good approximation for simultaneous small values of n 
and large values of 6. Table 1 confirms this suspicion. Also, for small n, the 
quality of approximation deteriorates as 6 increases. Larger values of n improve 
this transformation. In this case (3) and (4) are seen to be very close to nor- 
mality, even for large 6. 

Other numerical work, not presented here, was done for the cases a = 0.05 and 
a = 0.01 and for the same values of n and 6. These results show that (2) is the 
most suitable transformation when a = 0.05. The probability integral is over- 
estimated by (3) and (4). When a = 0.01, (2), (3) and (4) over-estimate the 
probability integral, (2) being the closer approximation. 

1 We are limited, for exact values of the probability integral of noncentral ¢, to the tables 
of Resnikoff and Lieberman {9}. These authors tabulate P{n-t S z} because the range for 
the argument n~t is about the same for all n and 4. This, of course, makes tabulation more 
compact. 

* The rather odd values of 6 which appear in Table 1 are not strange if one understands 
the mechanism of Resnikoff-Lieberman tables! In these tables, once n is selected, the non- 
centrality parameters are determined through the relationship 6 = (n + 1)'K, where K, 
is the upper p-point of the unit normal distribution and K, is determined from: @(K,) = 
1 — p. Hence 4 is given as a function of n and p and only tabulated for a suitable range 


of values of p. The reason for this construction depends on the original purpose for which 
these tables were.constructed. 





NICO F. LAUBSCHER 


TABLE 1 
Values of t,(n'z), i = 1, 2, 3, where n'z is the 90th percentile of t 


| 
| 





2.132924 
4.052622 
7.356558 
9.772173 


3.016410 | 1.073 
5.731273 1.811 
10.403744 | 3.131 
13.819939 | 4.113 


29 


| 3.694333 | 0.983 
| 7.019347 1.685 
| 12.741932 2.933 
| 16 .925900 3.859 


39 | | 4.265848 | 0.935 


| 8.105244 | 1.618 


| 14.713116 | 2.829 
| 19.544345 3.726 


| 
| 
| 
j 


4.769363 | 0.903 . 254 
9.061938 | 1.576 i, . 286 
| 16.449764 | 2.763 
21.851242 | 3.641 . 236 . 283 


3. Noncentral F. If X,,---, X,, are independently distributed and X; is 
N(u:, 1), then the random variable x” = Xj + --- + X32, is called a noncentral 
chi-square variable with m degrees of freedom and noncentrality parameter 
Keg + --- +28. 

If x” has the noncentral chi-square distribution with m degrees of freedom and 
noncentrality parameter \, and if x’, independently of x”, follows the central 


chi-square distribution with n degrees of freedom, then the ratio 

,’ 2 2 j 

F = (x"/m)/(x/n) 
has the topside noncentral F distribution [6] with m and n degrees of freedom 
respectively, and with noncentrality parameter X. 


3.1. The cosh™ transformation. The first two moments of F are given by Pat- 
naik [6] as follows: 


uw = n(m + r)/[m(n — 2)], 
we = n'[(m + d)* + 2(m + 2X)]/[m?(n — 2)(n — 4)] — wv’. 


It will be supposed throughout that n > 4. By eliminating the parameter \ 
between the second central moment and the mean, we find that the variance is 


we = 2{[u + (n/m)} — a’}/(n — 4), 





NORMALIZING ¢ AND F pIsTRIBUTIONS 


where 
a=n(m+n— 2)*/m(n — 2). 


From (1), (with e = 1 and K = a — (n/m)), the variance-stabilizing trans- 
formation, r(F’), is obtained as 


i / 
r(F) = G n— 2) cosh’ [Ft Sele). 
2 a 

It may be hoped that r(F’) is approximately normal with mean value 

ao ' af m+n+rA-2 | 

rin) = (5m 2) comh lecaee ss 

and with unit variance. Using the second derivative term in the Taylor series 
for expectations, the transformed random variable 


i j 
r=1(F)= (in - 2) {cosh*| thw =F 


caf} Hwva)e [ex ol | at 


where b = (m +n — 2)'/(n — 2)', may be better approximated by the normal 
distribution with zero mean and unit variance. 

However, the transformation cosh” {[F + (n/m)]/a} looks more innocent 
than it is: cosh” ‘(z) is real only if z 2 1, i.e., in our case only if 


F = (n/m){{m/(n — 2) + 1 — ) > 0. 


(5) 


{ 
Thus, for small values of F, we get no sensible approximation at all’. If m = n, 
the lower limit for F is {[2(n — 1)]|/(n — 2)}' — 1 = 0414, so it is not only 
very small values of F which are affected. 

Strictly speaking, therefore, this transformation is not well defined and hence 
should not be used to approximate the probability integral P{|F < x}. However, 
we have investigated the approximation of this by P{r(F’) s r(x)}. The values 
for m, n, \ and z are the same as those considered by Patnaik [6] and the ap- 
proximation (5) is given in Table 2 and compared with the exact values as given 
by Patnaik [6]. As may be expected, this approximation is not very satisfactory. 

3.2. The Square Root Transformation. It is well known [5] that (2x’)' is ap- 
proximately normal with mean (2n — 1)' and unit variance. Also [6], (2”)' is 
approximately normal with mean [2(m + ) — (m+ 2d)/(m + )]' and variance 
than (m + 2d)/(m + 2d). Thus, to the extent that these approximations hold, 

* However, it seems possible to remedy the situation as follows: It is clear that r(F + «) 
is well-defined if « 2 a — (m/n). Hence consider a power series expansion of r(F + «). Take 
mathematical expectations to find Var [r(F + «)] as an ascending series in powers of \~', 
where ) is the noncentrality parameter. Then it might perhaps be possible to select a value 
of « for which r(F + «) is well-defined and which will eliminate bias of O(\~"). Precisely this 
type of argument is used to derive the well-known transformation (X + 3/8)! from X# for 
the Poisson distribution. The details in the present problem might be overwhelming! 





NICO F. LAUBSCHER 


TABLE 2 
Approximate and Ezact Values of the Probability Integral P{F Ss x} 


Approximation | 
l "| Exact P(P $ =) 


(5) | (6) (7) | 

| Pir s r(z)} Pir, Sr(z)} | PlrySrg(z)) | 
0.734 | 0.743 +| 0.750 
0.9022 | 0.915 0.919 


0.274 | 0.205 0.202 
0.527 0.520 | 0.520 


0.696 0.696 0.707 0.700 
0.881 0.888 0.889 0.887 
0.151 0.124 0.119 0.126 
0.357 0.346 0.349 0.347 





0.716 | 0.730 | 0.73 ‘| 0.731 
0.895 0.910 | 0.914 | 0.914 
0.232 | 0.158 0.155 0.158 
0.481 0.467 0.463 0.461 


0.658 | 0.661 | 0.669 | 0.664 
0.861 | 0.870 | 40.872 +‘| ~~ # 0.870 
0.096 0.069 | 0.064 | 0.069 
0.264 0.245 
0.698 0.716 | 0.714 
0.888 | 0.909 0.908 
0.197 | 0.117 0.119 
0.438 | 0.409 «| (0.408 








| 
| 


8 


0.574 0.581 0.578 
0.8066 | 0.815 0.813 
0.027 0.014 
0.105 0.085 


9 
9 
36 
36 
9 
9 
36 
36 


(mF/n)* = (2x”)*/(2x*)' 


is the ratio of two independent normal random variables. 

From a theorem due to Fieller [2], if X and Y are normally and independently 
distributed with means m, and m, and standard deviations ¢, and o, respectively, 
then the function 


R = (mV — m,)/(o2V" + o3)', where V = Y/X, 


will be nearly normally distributed with zero mean and unit variance, provided 
the probability of X being negative is small. 

Applying this theorem to the variable (mF/n)', it may be hoped that the 
transformed random variable 





NORMALIZING t AND F DISTRIBUTIONS 111] 


(6) = (PF) = (2% —D)MmF/n)! — [2(m + 2) — (m + 2n)/(m + 2)} 
eS! —TimF]n) + (mn + 20)/(m + ¥)I 


will approximately have the unit normal distribution. 

To obtain the degree of accuracy of this transformation, the exact probabilities 
P\F s x} for given values of m, n, \ and z have been compared with those of the 
above approximation. This comparison is also shown in Table 2, and the closeness 
of approximation is very satisfactory. 

3.3. The Cube Root Transformation. Another transformation for normalizing 
the noncentral F distribution is obtained in a similar way by using the fact [5] 
that (x’/n)' is approximately normally distributed with mean [1 — 2/(9n)] 
and variance 2/(9n) and that (x”/r)* is also approximately normally distributed 
[1] with mean {1 — 2(1 + B)/(9r)| and variance 2(1 + B)/(9r), where r = 
m + and B = \/r. Thus 


(x”/r)' 
mF/(m + )]' = X= 
ne Y= Gaia 
is, insofar as these approximations hold, the ratio of two independent normal 


random variables, and thus it may be hoped that the transformed random 
variable 


(7) m= nF) = LT 2/(9n) Ih /(m + »)} = [1=[2(m + 2d)/9(m + d)'I} 
se [(2/9n)[(nF 7(m + X))¥ + [2(n + 2d) 9G FH) 


will approximately have the unit normal distribution. Table 2 shows the approxi- 
mation of P{F s z} by means of (7). 

By substituting \ = 0 in (7), this transformation reduces to the transforma- 
tion of Paulson [7] for the central F distribution. From (6) it is seen that an 
alternative transformation for normalizing the central F distribution is 


_ (2n — 1)* (mF/n)* — (2m — 1)! 


. [(mF/n) + 1) 
If we put m = 1, then F = y”/(x’/n) reduces to the noncentral random variable 
(. We conjecture that (6) and (7) with m = 1, F = ¢, (hence \ = 8), will 
transform ¢ approximately to the unit normal distribution. 

Although it is known that the Wilson-Hilferty transformation (x’/n)' and the 
Aty transformation (x”/r)* are both more nearly normally distributed than the 
Fisher transformation (2x’)' and the Patnaik transformation (2y")* respec- 
tively {1}, [6], it can be seen from Table 2 that (6) is a better approximation than 
(7), at least for the values of m,n, \ and z considered. The reasons for this ap- 
parent inconsistency remain undiscovered. Furthermore, (6) is a better approxi- 
mation than that given by Patnaik [6], at least for the values of m, n, \, and z 
considered. 


4. Acknowledgments. I wish to express my gratitude to Professor H. 8. Steyn 
for suggesting the original problem that led to this study. 





1112 NICO F. LAUBSCHER 


The approximations in the tables were computed on the “ZEBRA” digital 
computer of the National Physical Research Laboratory of the South African 
Council for Scientific and Industrial Reser rch. 


REFERENCES 

{1} Anpex-Ary, 8. H., “Approximate formulae for the percentage points and the prob- 
ability integral of the non-central x? distribution,’’ Biometrika, Vol. 41 (1954), 
pp. 538-40. 

(2) Frecier, E. C., ‘The distribution of the index in a normal bivariate population,’’ Bio 
metrika, Vol. 24 (1932), pp. 428-440. 

[3] Jounson, N. L., “Systems of frequency curves generated by methods of translation,” 
Biometrika, Vol. 36 (1949), pp. 149-176. 

[4] Jounson, N. L. anp We cn, B. L., “Applications of the non-central t-distribution,’’ 
Biometrika, Vol. 31 (1940), pp. 362-389. 

[5] Kenpauy, M. G. anp Struart, A., The Advanced Theory of Statistics, I, 2nd Ed. Griffin 
and Co., London, 1958. 

[6] Patnaik, P. B., “The non-central x*-and F-distributions and their applications,’’ 
Biometrika, Vol. 36 (1949), pp. 202-232. 

[7] Pautson, Epwarp, “‘An approximate normalizaticn of the analysis of variance dis- 
tribution,’’ Ann. Math. Stat., Vol. 13 (1942), pp. 233-235. 

[8] Rao, C. RapHakrisuna, Advanced Statistical Methods in Biometric Research, John 
Wiley and Sons, New York, 1952. 

{9} Resnixorr, GeorGe J. AND LIEBERMAN, GERALD J., Tables of the Non-Central t-diatribu- 
tion, Stanford University Press, Stanford, Cal., 1957. 





PROBABILITY CONTENT OF REGIONS UNDER SPHERICAL NORMAL 
DISTRIBUTIONS, Il: THE DISTRIBUTION OF THE RANGE IN 
NORMAL SAMPLES’ 

By Haroip Rusen* 

Columbia University 

1. Introduction and summary. Numerous investigations, both theoretical and 
numerical, have been made of the distribution of the range in normal samples. 
One of the first investigators was Student [1] who examined the distribution on 
an empirical basis. Somewhat earlier, Tippett [2] presented tables and charts for 
the mean, standard deviation, and of the measures of skewness and kurtosis 
(8; and ~,) for the range; further studies of the moments were made by E. 8. 
Pearson [3], Hartley and Pearson [6] and by Ruben [7]. Tables of the moment 
constants are available in [6] and in Pearson and Hartley’s book of statistical 
tables [10] (Tables 20 and 27), while, more recently, tables of the moment con- 
stants were provided by Harter and Clemm [11]. 

Approximations to the probability integral and percentage points of the distri- 
bution were suggested by E. 8S. Pearson [4], Cox [12], Patnaik [14] and Tukey [15]. 
Pearson’s approximations were based on Pearsonian distributions of Type I and 
VI, while Cox used the Gamma Function (i.e., a multiple of x? with fractional 
degrees of freedom) and Patnaik a multiple of x with fractional degrees of freedom 
as the basis for their approximations. These last two approximations have been 
compared by Pearson [5]. An approximation of a different type was derived by 
Johnson [16] who gave a series expansion for the probability integral suitable for 
low sample sizes and low values of the range. The behavior of the distribution for 
large sample size has been studied by Gumbel [17], [18], [19], Elfving [20], Cox 
[13] and Harley and Pearson [21]. 

For theoretical studies of the exact distribution reference is made to McKay 
and Pearson [22], Pillai [23], [24] and Cadwell [25] (see also Hartley [26}). Finally, 
tables of the probability integral and percentage points of the distribution have 
been provided by Hartley and Pearson [27], [10] (Tables 23 and 22), as well as 
by Harter and Clemm [11]. 

In the present paper, the following new results relating to the range distribu- 
tion will be obtained: (i) The latter function may be expressed as the product of 
the sample size and the probability content of a certain parallelotope relative to 
a hyperspherical normal distribution, and (ii) the function can be evaluated as an 
infinite series involving the even moments of a sum of independent truncated 


Received February 13, 1959; revised May 16, 1960. 

! This research was sponsored in part by the Office of Naval Research under Contract 
Number Nonr-266 (33), Project Number 042-034. Reproduction in whole or in part is per- 
mitted for any purpose of the United States Government. 

? Present address: Department of Statistics, The University, Sheffield, England. 


1113 





1114 HAROLD RUBEN 
normal variables. The distribution function may also be directly related to the 
moment generating function of the square of the sum of these truncated variables. 

2. Distribution of the range in normal samples. It is well known (see e.g., [10], 
p. 43) that the distribution function of the range in normal samples is given by 
(1) Pi(w) =n [ f(2)[P(2 + w) — P(x) de, 
where 

fle) = (are, P(e) = [so dt 

Accordingly, 


2 r+w 2 zr+w 2 
P,(w) = n[ (2x) te [ (Qe) te ™ duy | (Qe) "1 du,_, dx 


2 z+w z+w n—l 
nf f vf (2) exp(—} E + > «/) dx du, +++ duy1. 


On setting yj = u; — x(t = 1,2, ---,n —1),2’ =a, 


P,(w) =n T ff (25) 


( n—l 
(2) exp) 3 | 2" + X (yi + “y'}} dx’ dy, +++ din 


= ni [ ese [ (Qa) 1" ¢-4¢ dy eos dyn ; 


after integration with respect to zx’, where Q is a definite positive quadratic form 
in the y;, 


(3) Q = Qn, ye, °°*, Yrs) -Ev-w'(En) 
(ef., [8}). The orthogonal transformation 

& =(n — Ew, 

& = - Wis Y; 


reduces Q to a sum of squares, 


(4) 


n—l 
(5) Q(y, Ya» °° * » Yn) nat De. 
Finally, on applying the scaling transformation 
f= ne, 


(6) 
ef = § 





SPHERICAL NORMAL DISTRIBUTIONS 


a-l 
(7) Q(yr, Ys, °° > Ya) = x gf”. 


The relationship between the y,; and the £7 is provided by 


na—l 


(8) yj = {n/(m — 1)" + De att (j = 1,2,---,n— 1), 
and (2) reduces to 

nl 
(9) P,(w) = n | a / (32) exp (-} x e") dt? --+ dé%_4, 


where the region R is the polytope defined by 


n-l 
(10) 0 < {n/(n — 1)}*e? + De ako <w (j = 1,2, ---,n—1). 


R is a parallelotope since it is bounded by 2(n — 1) flats, each of dimensionality 
n — 2, which are parallel in pairs. Note further that one of the 2”~ vertices is 
at the center of the distribution of the ¢? , and that the diagonal of the parallelo- 
tope having the latter vertex as one of its end-points lies along the £{-axis. The 
length of this diagonal is {(n — 1)/n}'w. 

It now remains to determine the angles at the various vertices between the 


flats which bound the parallelotope. Each vertex is characterized by the equa- 
tions 


¥i, = w (a = 1,2,---,k), 
Yi, = O (8 = 1,2,---,n—1—k) 


(11) 


for k = 0,1,2,---,n — 1, where (1%;,%,, +--+ ,%) is a subset of k integers 
from the set (1, 2, --- ,n — 1) while (j: , je, «~~ , jn—s-«) is the complementary 
subset. It is understood in (11) that the y’s are to be expressed in terms of the 
é*’s by means of (8). To determine the (n — 1)(n — 2)/2 angles between the 
flats in (11) which are interior to the region R, the equation (11) must be re- 
placed by the inequalities 


—-¥. = —w (a = 1,2,---,k), 
Yi, =O (8 = 1,2,---,n—1—k). 


For convenience, refer to the first k flats in (11) as w-flats and to the remaining 
n— 1-—k flats as 0-flats. The angle 6,; between the ith and jth flats at the 
vertex defined by (10) is then given by 


(12) 


n—l 
n 
greg +2 Agi Om j 


a een eer a” 
n 2 n 2 
( Dal +2 a2.) (= i +2 a:,) 


(13) cos 4;; = + Gj ca i), 





1116 HAROLD RUBEN 


according as to whether the flats are both w-flats or both 0-flats on the one 
hand, or are of different types on the other. Equation (13) simplifies to 


(14) cos 6 = +4, 


depending on whether the two flats are or are not of the same type, on using 
the orthogonality property of the matrix in (4). All the dihedral angles are 
therefore either 27/3 or 1/3. 

A formal expression for the probability content of the parallelotope may be 
obtained using the method of sections [9], the sections being conveniently chosen 
in this instance to be orthogonal to the £f-axis. The probability content of that 
portion of the parallelotope between the sections distant z and z + dz (a slab 
of infinitesimal thickness dz) from the center of the distribution is (Qe) dz 
-Q,-2(z; w). Here Q,-2(z; w) is the probability content of the (n — 2)-dimen- 
sional polytope, 7',-2(z; w), the intersection of the flat £f = z with the parallelo- 
tope, relative to the (n — 2)-dimensional spherical normal distribution in the 
linear subspace t? = z with center at the centroid of T,-2(z; w) and with unit 
variance in any direction. Thus, 


{(n—1) /n}dw ‘ 
(15) P.(w) =n [ (Qa) 4e*"Q,,_2(2; w) dz (n = 2,3,---), 
0 


where Qo(-; +) = 1. 
For n = 2, equation (15) reduces to 


2-bw 


(16) Px(w) = 2 | (20) te az, 


so that the density function is #* exp (—}w’), and we obtain the otherwise 
obvious result that }w’ is distributed as a chi-square with 1 degree of freedom. 
For n = 3, (15) is easily shown to be equivalent to 


(17) P,(w) = 12V(2*w, 6 *w), 
where V(-; -) is Nicholson’s function [28] defined by 


1 h kath iil 
Vi, k) sais a | [ e 1 +y*) dy dx 


and tabulated in [28] and [29] (ef., [22] and [29], p. XX XIII). For n > 3, the 
right-hand member of (15) cannot be expressed in terms of elementary functions. 
The difficulty arises essentially because the nature of the cross-sectional polytope 
T',-2(z; w) varies as z increases from 0 to {(n — 1)/n}'w; specifically, the num- 
ber of faces of the parallelotope intersected by the cutting flat varies with its 
location along the diagonal of the parallelotope lying on the £f-axis. (For values 
of z near the lower and upper limits, 0 and {(n — 1)/n}'w, respectively, the 
cross-sections are simplices and the corresponding Q-functions are then the 
K-functions discussed previously [9].) Nevertheless, (15) should provide a use- 
ful point of departure for future further study of the range distribution. 





SPHERICAL NORMAL DISTRIBUTIONS 1117 


We now obtain a series expansion for P,(w) from equation (2) as follows: 


P,(w) = n' [ vee [ (2) exp (- ; 3 i’) 
' a n) / [(2n)'r!] hay ++ dnt 


>: (ar)! (2r)! c”( hw’), 


age 20 1" r! 
where 
‘nl k; + 1 
19) Ch” (4w*) = Tye (Ht) / ky! 
( 9, (hw ) kit Bees IT 2 , 


and [,(z) is the Incomplete Gamma Function, [,(z) = fie “ud 

(m > 0). In general, however, it will be difficult to evaluate the C$”, even for 
moderate n, by direct enumeration of the partition integers k; and reference to 
tables of the Gamma Function [30). An alternative and more convenient series 
development for P,,(w) follows directly from (2) in the form 


(20) P.(w) = n'@(w) 3 aay MERC; w), 
where 


(21) G(w) = (2x) f eo dr 
My1(0;w) = (G"(w)}* [ ies [ (2")-40-» 


1 n-—l a-l 
* exp (-} p> yi + p> x) dy; +++ din 


= [eo {F(w — 6) — F(--6)}/G(w)]"", 


4 a” 7 6; 4 n—l 
(23) M22(0; w) = 2 MeaKG w) a = yi, (= u). 


The series in equation (20) seems to be computationally convenient for low or 
moderate n. It should be remarked that M,_,(@; w) is the moment generating 
function of the sum of n — 1 independent and standardized normal variates 
Yi, Y2, *** » Yar, each of which has been truncated at 0 and at w, while the 
M%}(0;w) then give a representation of P,(w) in terms of the even moments of 
the sum variate. The us,(>-!~ y;) can be obtained as polynomial functions of 
u,(y;), the moments of a standardized normal variate truncated at 0 and at w. 
The moment generating function of the latter variate is 


(24) M,(6; w) = &”{F(w — 6) — F(—6)}/G(w), 





1118 HAROLD RUBEN 


from which the recursive relation 


(25) M{"*” (0; w) = mM{""” (0; w) — w"f(w)/G(w) (m = 1,2, ---) 


is readily adduced. (Observe that M{""(0;w) = ui(y).) An explicit formula for 
u(y) may also be obtained in the form 


’ 1 w/(k (k —j)! 
(26) w(y) = “aw & () tee (Tj.(w) — Ij4(0)) 


where Doe indicates that summation is to be effected over those j from the set 
(0, 1, 2, --- , k) for which k — j is even, and 


Tja(az) = f(x) Hj(z) (j = 1,2, +--+), 
Ii(4) = —F(z), 


(27) 


H(z) being the Tchebycheff-Hermite polynomial of degree 7 — 1 in z. In 
particular, for the even moments of y, 


par(y) = 1-3 +++ (27 — 1) — {G(w)}™ 


(28) - > [2"-"(2m + 1)(Qm + 2) +++ (2r)/(r — m) 1 (10) Hom_a(w) 
(r = 1, 2, -++)3 


However, equation (25) seems to be more convenient for the purpose of com- 
putation. The first ten moments of y are given by equation (25) as 


(29) wily) = {f(0) — f(w)}/G(w), 
(30) ua(y) = 1 — wf(w)/G(w), 
(31)  ws(y) = {2f(0) — 2f(w) — w°f(w)}/G(w), 
(32) pal(y) = 3 — (w’ + 3w)f(w)/G(w), 
(33) ws(y) = (8f(0) — 8f(w) — 4w°f(w) — w'f(w)|/G(w), 
(34) us(y) = 15 — (w+ 5w’ + 15w)f(w)/G(w), 
(35) ywr(y) = (48f(0) — 48f(w) — 24w*f(w) — 6w'f(w) — w'f(w)}/G(w), 
(36) us(y) = 105 — (w’ + 7w* + 35w* + 105w)f(w)/G(w), 

uo(y) = [384f(0) — 384f(w) — 192w°f(w) — 48w'f(w) 

— 8w'f(w) — w'f(w)}/G(w), 

(38) proly) = 945 — (w’ + 9w’ + 630° + 315w* + 945)f(w)/G(w). 


(37) 


The even moments of >-}~’ y; required in formula (20) may be obtained from 
the moments of y by using the relationship 


3 For r = m, (2m + 1) (2m + 2) -+- (2r) is to be interpreted as 1. 





SPHERICAL NORMAL DISTRIBUTIONS 


E (= x) | E [cer = Tl yt! /k | 


(39) kites +key pe tr Jol 


(2r)) Do — [] me,/ks! 
bytes + +hy ptr j=l 
where Ba, refers to wi, (y). 
This gives for the first three even moments of }-}" y,, 


(40) us( Do ys) = (nm — 1)us + (nm — 1)a(u1)’, 
us( >. ys) = (nm — 1)ma + 4(m — 1) augur + 3(m — 1)2(us)” 
+ 6(n — 1)s(us)*u2 + (m — 1)a(us)’, 
ue( >. ys) = (mn — 1)us + 6(m — 1) augur + 15(m — 1)ouaus 
+ 10(n — 1)s(us)* + 15(m — 1)s(u1)*we + 6O(m — 1) suimrus 
+ 15(m — 1)s(u2)’ + 20(m — 1)a(ut) "us + 45(m — 1)4(ui)*(us)? 
+ 30(n — 1)s(ut)*us + (nm — 1)e(ui)’, 
where (n — 1), = (n — 1)(n — 2) «++ (n—71). 


3. Concluding remarks. It may be useful to conclude with some remarks con- 
cerning further possible research on the distribution of the range in normal samples. 

The distribution function of the range has been expressed in terms of the 
probability content of a parallelotope relative to a centered spherical normal 
distribution with unit standard deviation in any direction (equation (9)). Here 
the latter probability content was subsequently represented as an infinite series 
(equation (20) ). It would be desirable to obtain alternative expressions for the 
probability content and, in particular, suitable approximations for both moderate 
and large n. It seems likely, for example, that (15) should provide good approxi- 
mation formulae for the distribution and its percentage points. One such ap- 
proximation is obtained by replacing T,.(z; w) by the (n — 2)-dimensional 
sphere of equal volume-content and, correspondingly, the Q-function by the 
distribution function of a chi-square with n — 2 degrees of freedom (or an In- 
complete Gamma Function). 

Two further approximations worthy of further study may be obtained in the 
following manner: 

(a) The volume-content of the parallelotope R in (9) is nw", and the 
center C or R is at the point ({(n — 1)/n}'w/2, 0, --- , 0) in &*-space. This 
suggests that R may be replaced approximately by a sphere of volume nw” 
and with center at C. It will be noted that this is equivalent to approximating 
the distribution of w* by a non-central x’ with n — 1 degrees of freedom. 

(b) Replace R by a spherical sector of equivalent volume-content with its 
pole at the center O of the spherical normal distribution and with its n — 1 
bounding flats determined by the n — 1 facesof R which meet at O (the dihedral 


(41) 





1120 HAROLD RUBEN 


angle between any two of the bounding flats is then 27/3). This is equivalent to 
approximating w by a multiple of x with n — 1 degrees of freedom (cf., [14]). 
Preliminary computation indicates that this approach yields good approxima- 
tions for the moments of w, provided n is not too large. 


REFERENCES 


{1} Srupent, ‘Errors in routine analysis,’’ Biometrika, Vol. 19 (1927), pp. 151-164. 
(2) L. H. C. Tiprert, “On the extreme individuals and the range of samples taken from 
a normal population,’’ Biometrika, Vol. 17 (1925), pp. 364-387. 
(3] E. 8S. Pearson, “A further note on the distribution of range in samples taken from a 
normal population,’ Biometrika, Vol. 18 (1926), pp. 173-194. 
[4] E. 8. Pearson, ‘The percentage limits for the distribution of range in samples from a 
normal population (nm Ss 100),’’ Biometrika, Vol. 24 (1932), pp. 404-417. 
[5] E. S. Pearson, ‘Comparison of two approximations to the distribution of the range 
in small samples from normal populations,’ Biometrika, Vol. 39 (1952), pp. 130- 
136. 
{6} H. O. Hartriey anp E. 8S. Pearson, ‘“Moment constants for the distribution of range 
in normal samples,’’ Biometrika, Vol. 38 (1951), pp. 463-464. 
(7] H. Rupen, “On the moments of the range and product moments of extreme order 
statistics in normal samples,’’ Biometrika, Vol. 43 (1956), pp. 458-460. 
{8} H. Rusen, ‘‘On the moments of order statistics in samples from normal populations,’’ 
Biometrika, Vol. 41 (1954), pp. 201-227. 
{9} H. Rupen, ‘‘Probability contents of regions under spherical normal distributions, I,’’ 
Ann. Math. Stat., Vol. 31 (1960), pp. 598-618. 
[10] E. 8S. Pearson anv H. O. Hartiey, Biometrika Tables for Statisticians, Vol. 1, Cam 
bridge Univ. Press, 1954. 
{11] Legon Harter anv Donatp S. Ciemm, “The probability integrals of the range and of 
the studentised range. Probability integral, percentage points, and moments of 
the range,”’ W.A.D.C. Technical Report No. 58-484, Vol. 1, 1959. 
{12] D. R. Cox, ‘*The use of range in sequential analysis,’’ J. Roy. Stat. Soc. Ser. B, Vol. 11 
(1949), pp. 101-114. 
[13] D. R. Cox, “A note on the asymptotic distribution of range,’’ Biometrika, Vol. 35 
(1948), pp. 310-315. 
(14) P. B. Parnark, ‘The use of mean range as an estimator of variance in statistical tests,”’ 
Biometrika, Vol. 37 (1950), pp. 78-87. 
[15] J. W. Tuxey, “‘Interpolations and approximations related to the normal range,’’ Bio- 
metrika, Vol. 42 (1955), pp. 480-485. 
[16] N. L. Jounson, “Approximations to the probability integral of the distribution of the 
range,’’ Biometrika, Vol. 39 (1952), pp. 417-419. 
[17] E. J. Gumpe., ‘‘Ranges and midranges,’’ Ann. Math. Stat., Vol. 15 (1944), pp. 414-422. 
{18] E. J. Gumpe., ‘‘The distribution of the range,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 
384-412. 
[19] E. J. Gumpe., ‘Probability tables for the range,’’ Biometrika, Vol. 36 (1949), pp. 142- 
148. 
[20] G. E.rvine, ‘The asymptotical distribution of range in samples from a normal popu- 
lation,’’ Biometrika, Vol. 34 (1947), pp. 111-119. 
[21] B. I. Haruey anp E. 8S. Pearson, “The distribution of range in normal samples with 
n = 200,’ Biometrika, Vol. 44 (1957), pp. 257-260. 
(22) A. T. McKay anp E. 8S. Pearson, “A note on the distribution of range in samples of 
n,”’ Biometrika, Vol. 25 (1933), pp. 415-420. 
[23] K. C. 8. Priuar, “A note on ordered samples,’’ Sankhyd, Vol. 8 (1948), pp. 375-380. 





SPHERICAL NORMAL DISTRIBUTIONS 1121 


(24) K. C. 8. Pruxat, ‘On the distributions of mid-range and semi-range in samples from a 
normal population,’’ Ann. Math. Stat., Vol. 21 (1950), pp. 100-105. 

[25] J. H. Capwe.., ‘‘The distribution of quasiranges from a normal population,’’ Ann. 
Math. Stat., Vol. 24 (1953), pp. 603-613. 

[26] H. O. Harter, “The range in random samples,”’ Biometrika, Vol. 32 (1942), pp. 334- 
348. 

(27} H. O. Hartiey anp E. 8S. Pearson, ‘The probability integral of the range in samples 


of n observations from a normal population,’’ Biometrika, Vol. 32 (1942), pp. 
301-310. 


[28] C. Nicnoxson, “The probability integral for two variables,’’ Biometrika, Vol. 33 (1943), 
pp. 59-72. 
[29] Nationa Bureau or Stanparps, Tables of the Bivariate Normal Distribution Func- 


tion and Related Functions, U. 8. Government Printing Office, Washington, 
D. C., 1958. 


(30) Karu Pearson, Tables of the Incomplete l'-function, Cambridge Univ. Press, 1922. 





TABLES OF RANGE AND STUDENTIZED RANGE! 


By H. Leon Harter 
Aeronautical Research Laboratories, Wright-Patterson Air Force Base 


0. Summary. A description is given of the computation of tables of percentage 
points of the range, moments of the range, and percentage points of the stu- 
dentized range fo- samples from a normal population. Percentage points of the 
(standardized) range W = w/o corresponding to cumulative probability 
P = 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1 (0.1) 0.9, 0.95, 0.975, 
0.99, 0.995, 0.999, 0.9995 and 0.9999 are given to six decimal places for samples 
of sizen = 2 (1) 20 (2) 40 (10) 100. Moments (mean, variance, skewness, and 
elongation) of the range W are given to eight or more significant figures for 
samples of size n = 2 (1) 100. Percentage points of the studentized range 
Q = w/s corresponding to cumulative probability P = 0.9, 0.95, 0.975, 0.99, 
0.995, and 0.999 are given to four significant figures or four decimal places, 
whichever is less accurate, for samples of sizen = 2 (1) 20 (2) 40 (10) 100, with 
degrees of freedom v = 1 (1) 20, 24, 30, 40, 60, 120, and for the independent 
estimate s’ of the population variance. All tabular values are accurate to within 
a unit in the last place. 


1. History and Introduction. 

1.1. Theory and Tables. One of the earliest writers to give serious consideration 
to the use of the sample range as a measure of dispersion was Tippett [26}. 
Tippett tabulated (to five decimal places) the mean range (in terms of the 
population standard deviation) of samples of size n, taken from a normal popu- 
lation, forn = 2 (1) 1000. Healso calculated a few values of ow , Bi. , and Be.» . 
This work was extended by E. 8. Pearson [16], who computed modified values of 
8, and 8, for n = 60 and 100. “Student” [24] fitted Pearson curves to the first 
four theoretical moments of the range for several values of n. Using “‘Student’s” 
results with modifications, E. 8. Pearson [17] computed lower and upper 0.5%, 
1%, 5%, and 10% points (to two decimal places) for n = 2 (1) 30 (5) 100. In 
collaboration with McKay [13], he found the exact distribution of the range for 
samples of 3 from a normal population, and gave certain new results regarding 
the form of the range curve at its terminals. 

The idea of using the ratio of the range to an independent estimate s of the 
population standard deviation (studentized range) was first proposed by “Stu- 
dent”’ in a letter written to and later published by E. 8. Pearson [19]. Newman 
[14] developed this notion further, and tabulated a number of 5% and 1% points 
of Q = w/s, which he obtained by quadrature from the approximate probability 
law of W = w/c given by E. 8S. Pearson [17], except for n = 3, where the exact 
distribution of W obtained by McKay and Pearson [13] was used. 


Received October 2, 1959; revised July 30, 1960. 
1 Adapted from an Invited Address presented on April 2, 1959 at the Cleveland meeting 
of the Institute of Mathematical Statistics. 


1122 





RANGE AND STUDENTIZED RANGE TABLES 1123 


The probability integral of the range was first tabulated by Pearson and 
Hartley [20], forn = 2 (1) 20, W = 0.00 (0.05) 7.25. Values were computed to 
five decimal places, but only four decimal places were published. Included in the 
same paper was a table of lower and upper 0.1 %, 0.5%, 1.0%, 2.5%, 5%, and 
10% points forn = 2 (1) 12. The theoretical basis for the computations was out- 
lined by Hartley [8] in another paper. 

Pearson and Hartley [21] also tabulated the probability integral of the stu- 
dentized range, which they wrote in the form 


P.(Q) = P.(Q) + v'an(Q) + v*b,(Q), 


where » is the number of degrees of freedom for the independent estimate s of 
the population standard deviation and P,(Q) is the probability integral of the 
range. The tables gave values (to four, two and one decimal places, respectively ) 
of P,.(Q), an(Q), and b,(Q) for Q = 0.00 (0.25) 6.50 and n = 3 (1) 20, with 
the observation that the results were somewhat inaccurate for small values of 
v(<10) and for large values of Q(>6). Lower and upper 5% and 1% points 
were given for n = 2 (1) 20, »v = 10 (1) 20, 24, 30, 40, 60, 120, and ~, the 
results for n = 2 being obtained by multiplying by (2)* the corresponding per- 
centage points of the Student ¢ distribution. Upper 5% and 1% points for 
n = 3(1) 20 and vy = 10 (1) 19 were tabulated to only one decimal place, with 
two decimal] places for the other tabular values. May [12] published extended and 
corrected tables of the upper percentage points. Values were given to two decimal 
places down through » = 2 and to one decimal place for y = 1. Values were 


based on exact quadrature for vy = 1 (1) 4, 6, 8, 12, and 24 and for 6 to 8 equi- 
distant values of 1/Q, with other values obtained by Lagrangian and/or har- 
monic interpolation. Hartley [9] published further corrections for » = 5 and 
» = 7. 


1.2. Applications. One of the earliest applications of the sample range was in 
statistical quality control. Shewhart [23], the father of statistical quality control, 
considered and tentatively rejected the use of the range as a measure of dis- 
persion, but E. 8. Pearson [18] justified its use and tabulated factors for the 
calculation of control limits for the range. Due to the greater ease in calculating 
the range, it soon replaced the standard deviation in most applications of quality 

control. 
' E. 8. Pearson [17] showed that if the population standard deviation is to be 
estimated from the mean range of a number of subgroups, the optimum sub- 
group size is 8, but it remained for Grubbs and Weaver [3] to make an extensive 
study of the use of this method. 

Newman [14], following a suggestion by “Student”, proposed a systematic 
method of applying the studentized range to multiple comparisons of treatment 
means in an experiment, but it was a number of years before any further work 
on this subject was published. 

The use of the range in estimating all the variances involved in the analysis 
of variance was explored by Rodgers [22], who credited W. J. Jennett with sug- 
gesting this procedure. Further work on the use of the range in the analysis of 





1124 H. LEON HARTER 


variance was done by Patnaik [15], Hartley [10], and David [1], but the use of the 
range in the overall analysis (as contrasted with the use of the studentized range 
in the multiple comparisons procedures which may be used to determine which 
means differ significantly when overall significance has been established) has 
never met with wide acceptance. 

One of the notable occurrences of the past ten years has been the development 
of a number of multiple comparisons procedures, several of which are based on 
the studentized range. Among these are the Newman-Keuls test (a modification 
by Keuls [11] of a test first proposed by Newman [14]), two tests proposed by 
Tukey [27], [28], and the new multiple range test proposed by Duncan [2]. The 
author [4] made a study of the error rates and sample sizes for a number of 
multiple comparisons tests based on the studentized range. 


2. Computation of the Tables. 

2.1. Percentage Points of the Range. For samples of size n from N(u, 1), the 
values of the range corresponding to certain cumulative probabilities P were 
computed by inverse interpolation, using Aitken’s method with a tolerance of 
5 x 10°’, in a new table of the probability integral of the range. The probability 
integral was computed at an interval of 2~° in W for n = 2 (1) 20 (2) 40 (10) 100, 
accurate to within a unit in the eighth decimal place. The method of computation 
of the probability integral, and a table at an interval of 0.01 in W, obtained by 
subtabulation, have been given by Harter and Clemm [6], and will not be in- 
cluded here. The percentage points were computed for the same values of n as 
the probability integral. The cumulative probabilities P involved were 0.0001, 
0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1 (0.1) 0.9, 0.95, 0.975, 0.99, 0.995, 0.999, 
0.9995, and 0.9999. The interpolation was performed on the Univac Scientific 
(ERA 1103A) computer. In some cases, it was necessary to divide each of three 
intervals in the relevant portion of the table into eight subintervals and sub- 
tabulate the probability integral, in order to meet the tolerance on the inverse 
interpolation. This happened especially for small values of P and for values of 
n near 4. The resulting percentage points were rounded to six decimal places, 
with error not exceeding a unit in the last place, then punched on cards and 
printed on the IBM 407 tabulator. The results are shown in Table 1. 

2.2. Moments of the Range. The expected value of the kth power of the range 
W, for samples of size n from N(y, 1) is given by the equation 


E(W*) = n(n — 1) Lvl W* (@(X + W) 
(1) oO 0 


— 6(X)]"* o(X + W) aw o(X) dz, 


with 


x 
o(X) = (2e)4e*"". — @(X) = [ o(X) aX. 
0 





RANGE 


0000177 
02000886 
00001772 
%e008F62 
%e917725 
92944319 
2088681 


90177712 
00358287 
02544925 
00741614 
00953873 
10190232 
10465738 
10812388 
20326174 


20771808 
30169822 
349642773 
30969745 
40653508 
40922533 
50562128 


02590186 
02751013 
00834826 
10075281 
10204819 
106410019 
1¢600414 


1¢835449 
20141656 
20376728 
20586852 
2079084) 
32002059 
30235931 
30519834 
30931349 


40286309 
40604857 
40987183 
50254550 
50822728 
62049760 
60545530 


0eV19046 
00042594 
Ce 00245 
Oe1 54847 
061909645 
00393071 
00431402 


90618352 
00909092 
10138259 
10362597 
1¢587788 
10826320 
20094590 
20423529 
209292380 


30314493 
30682268 
40120303 
40424235 
§ 0963453 
50216400 
50864157 


0¢708709 
020878357 
02965508 
10212115 
10343385 
10549720 
10739853 


10973327 
20276121 
20507898 
20714772 
20915438 
30123122 
30353046 
30632192 
42037923 


40386509 
42700411 
520077506 
50341439 
§2902906 
60127468 
60618237 


TABLE 1 


Percentage points of the range for samples of n from N (u,1) 
(Values for which the cumulative probability is P) 


02092394 
00158155 
00199446 
02e347702 
00433676 
00594643 
02759533 


00979366 
10285672 
10531485 
le 756529 
16978320 
20210281 
20468799 
20783758 
30249446 


30633160 
30984015 
40402801 
40694087 
50208804 
505528655 
60082864 


00819433 
02995220 
10084583 
106334927 
120467033 
10673517 
10862843 


22094446 
20393844 
20622556 
20826491 
30024202 
30228778 
30455258 
30730280 
40129346 


404741246 
4784033 
50156635 
50417616 
52973307 
62195739 
62682189 


00205489 
00308222 
00367392 
00554904 
02665015 
00849672 
16929940 


1°¢261398 
1057344) 
10818447 
20040097 
20256882 
26482427 
2¢732688 
30037317 
30478281 


30857656 
40197026 
40602821 
40885585 
50483754 
50721773 
60239691 


90922514 
10102585 
10193404 
10445920 
10578303 
10784355 
10972582 


2¢202195 
20498317 
20724195 
20925467 
30120531 
30322347 
30545785 
30817183 
4211200 


40551864 
40858286 
50226963 
50485364 
60036000 
60754568 
60739227 


AND STUDENTIZED RANGE TABLES 


02334168 
00463700 
00534736 
00748983 
00869515 
10065951 
16252885 


10468195 
10799995 
20942028 
2025964) 
20471652 
20691658 
20935559 
30231739 
30660721 


40030092 
40360906 
46757047 
50033479 
50619333 
50652849 
60361710 


10018443 
106201493 
106293250 
10546898 
16679205 
108844746 
20071455 


20299057 
22592064 
20815329 
30014177 
30206853 
30406194 
30626919 
30895093 
49284635 


40621655 
40924993 
50290196 
5054631? 
6009746R 
6e211378 
60790668 


02464515 
02612589 
02691347 
02921625 
12048144 
12250500 
10440141 


12676051 
10985445 
20223993 
20437704 
2264545? 
20860733 
32099199 
30388684 
3ef08098 


4.169554 
44493624 
42882166 
50153613 
50729754 
50959710 
60461392 


101078620 
106292918 
1¢385252 
1¢639327 
16771331 
10975611 
20161277 


20386902 
20676969 
20897818 
30094450 
30284960 
30482065 
30700346 
30965627 
4.351158 


46684920 
46985497 
50347592 
50601663 
6e14380? 
60361727 
60837491 





12191258 
10377729 
106470384 
16724403 
1¢855°58 
20959129 
20243459 


20467168 
20754467 
20973079 
30167678 
30356208 
30551278 
30767343 
40930005 
40411913 


40742732 
52040817 
52400105 
50652328 
60190836 
62406916 
62880436 


126595869 
10783582 
10875646 
20125000 
20252791 
20448862 
20625752 


20839570 
30113340 
303271365 
30506497 
32685915 
30871692 
42077692 
40328517 
42694104 


52011689 
50298566 
¥0645215 
52889103 
60411188 
60621165 
72082210 


H. LEON HARTER 


TABLE 1 (Continued) 


10269336 
10456682 
1¢549474 
10803104 
10934115 
20136113 
20319117 


220540983 
20825680 
30942215 
30234938 
30421650 
30614858 
40828898 
40989172 
40467782 


40795924 
50091743 
50448476 
52699017 
60234215 
60449068 
60920084 


1703469 
12890415 
16981895 
20229147 
20355636 
20549497 
2724238 


20935330 
30205511 
46410792 
30593503 
30770611 
30954046 
40157522 
420405388 
42766506 


52081193 
50365277 
52708769 
52950573 
60468538 
60676981 
70134877 


16342579 
16530432 
10623228 
10876236 
20006645 
20207442 
20389145 


22609248 
20891495 
30106097 
30297083 
30482118 
30673612 
30885792 
40143877 
40519664 


40845154 
50138897 
50493291 
50742289 
60274452 
60488177 
60956892 


16801126 
12987092 
20077933 
20323063 
20448296 
2640077 
20812828 


30021432 
30288367 
32491185 
30671730 
3ef84ETTE 
42028125 
40229355 
40474592 
40832494 


50143852 
50425452 
50766138 
62006087 
62520381 
60727454 
70182537 


10411460 
1e599544 
10692245 
1094447] 
20074243 
20273835 
20454272 


20672690 
20952632 
30165428 
39354799 
30538280 
30728188 
30938651 
40194716 
4¢567519 


40890951 
50182782 
50535020 
50782597 
60311958 
60524643 
60991231 


16890368 
20075236 
20165415 
20408454 
20532489 
20722319 
20893229 


30099551 
30363534 
30564123 
30742714 
30915902 
4095376 
40794588 
40537464 
4ef92122 


50200850 
50480220 
50818385 
60056666 
60567657 
60773495 
70226043 


10476396 
10664501 
10757038 
20008372 
26137488 
203235RR4 
20515097 


20731907 
30009675 
30220781 
40408646 
30590680 
30779115 
30987986 
40742179 
40612403 


40933745 
50223806 
50574047 
50820307 
60347071 
60558791 
70023403 


10972413 
20156122 
20245638 
20486648 
20609546 
20797547 
20966750 


30170970 
30432252 
30630811 
30807625 
30979129 
40156902 
40354286 
40595024 
469467466 


50253094 
505304645 
50866325 
60103092 
60611084 
60815802 
70266043 


16537757 
16725726 
10818045 
20068413 
20196866 
20294087 
20572117 


20787296 
309631909 
30272628 
30459083 
30639767 
30826829 
4.034216 
4e286668 
42654494 


40973892 
50262307 
50610690 
50855724 
60380070 
60590890 
72053659 


200482746 
20230775 
20319635 
20558691 
20680513 
20866799 
30034415 


30236692 
30495491 
30692189 
30867378 
44037242 
40213563 
42409280 
42648069 
40997113 


50301290 
50576798 
50910592 
60145978 
60651229 
60854920 
70303048 





RANGE AND STUDENTIZED RANGE TABLES 


20118672 
22390019 
203882409 
20625423 
20746227 
22930905 
30097940 


32297515 
32554019 
30749002 
30922697 
40991246 
40266041 
40460228 
46697229 
52043816 


52346000 
50619817 
52951695 
60185810 
60688538 
60891285 
70337465 


2775321 
20943168 
30024532 
30242654 
30353527 
30522895 
320675244 


32859209 
40946985 
4e27T4674 
40435016 
425910912 
40753185 
4o9333%6 
52155024 
50479870 


50764388 
6092316F 
60337964 
60560772 
TeC4OROS 
70235034 
70663595 


TABLE 1 (Continued) 


20184354 
220364536 
20457137 
20687528 
22807369 
20990534 
30155285 


30354083 
32608458 
30801853 
30974166 
4214)409 
40314888 
40597662 
46743013 
$.087333 


50387678 
50659933 
52990041 
60222982 
60723377 
60925249 
70369626 


20927787 
30992146 
30171801 
30365323 
320493863 
30659695 
30808908 


46989161 
4e229324 
Ag 2OhETS 
4e554932 
4e707768 
4866667 
644245 
50262057 
52582115 


§ 2867730 
60118178 
60429173 
60649797 
70174401 
70316710 
7%e7417R5 


202458465 
20424886 
225118868 
20745568 
22864498 
30046239 
30209693 


30406923 
30659314 
30851234 
42022264 
4e1A8294 
42360553 
42552018 
42785838 
50128057 


50426697 
50697502 
60025967 
60257818 
60756044 
60957102 
72399801 


30057130 
36218493 
30296690 
30506312 
36612885 
30775746 
30922332 


42099483 
40326798 
49590222 
4655239 
4e806177 
40963266 
Se13f4ée 
50353283 
5 0669346 


50946701 
62199362 
66507175 
60775273 
70196017 
7e 386706 
76807923 


20303612 
20481541 
20567966 
22800013 
22918079 
30098477 
30260711 


30456472 
30707009 
30897553 
40067386 
4232287 
40403408 
40593654 
40826050 
50166312 


50463363 
50732818 
60059751 
66290585 
60786787 
60987086 
70428215 


30169149 
30327905 
30404839 
30611097 
30715979 
30876293 
40020633 


40195134 
40419167 
40590182 
40743118 
40892097 
50947218 
50220308 
50432651 
50745304 


60019871 
60279146 
60575233 
60791507 
70758582 
70447888 
70866213 


20358049 
20534898 
20620768 
20851256 
20968502 
30147630 
30308715 


30503096 
36751895 
30941149 
40109863 
40273707 
40443766 
40632872 
40863937 
50202369 


50497935 
50766127 
60091626 
6¢321508 
60815815 
72015401 
720455058 


30267739 
30424199 
36500025 
30703340 
30806747 
30964843 
40107227 


46779427 
42500610 
40669535 
408206646 
40967966 
50121258 
50792614 
52502810 
50812496 


64084638 
60732844 
60635542 
60850224 
70314095 
70502186 
79917977 


20590592 
20762552 
20845954 
30069615 
30183321 
30357004 
36513199 


30701734 
340943206 
40127044 
40291064 
42450468) 
42616090 
42800425 
50025917 
50356687 


56646026 
50908917 
60228393 
60454269 
60940586 
70137165 
72570602 





1128 H. LEON HARTER 


For n = 2, the double integration on the right-hand side of equation (1) can be 
performed in closed form, giving the results E(W) = 2/(x)', E(W’) = 2, 
E(W’) = 8/(x)', E(W*) = 12, ete. For r = 3, it is necessary to resort to 
numerical integration; this was done for n = 3 (1) 100, k = 1 (1) 4, using double 
precision arithmetic on the ERA 1103A computer, employing the seven-point 
Lagrangian integration formula for the inner integral and the trapezoidal rule 
for the outer integral (since the integrand tends strongly to zero at both limits). 
From the values of E(W*), values of the variance oy , the standard deviation 
ow, and the third and fourth moments about the mean, u;.~ and py.9, were 
computed. From these in turn the standard third and fourth moments, a;.~ and 
a4.w , were computed. For n = 3 (1) 15, intervals h = 0.05 and 2h = 0.10 were 
used for the numerical integration; for n = 16 (1) 100, the intervals used were 
h = 0.10 and 2h = 0.20. The agreement between the results for h and 2h is such 
as to indicate that if one retains 10 decimal places (11 significant figures) for 
E(W), 10 decimal places (10 significant figures) for oy , 8 decimal places (8 
significant figures) for as. , and 7 decimal places (8 significant figures) for 
ay.w, the error will not exceed a unit in the last place, except possibly in the 
case of oy for n = 5 (1) 11. A few corrections of one in the last place of the 
mean and of up to three in the last place of the variance for sample sizes up 
through 20 were made in order to make the results agree with values computed 
from an unpublished 20-decimal-place version (ten decimal places were pub- 
lished) of tables of the expected values of order statistics and products of order 
statistics computed by Teichroew [25]. The resulting Table 2 was printed on the 
IBM 407 tabulator. 

2.3. Percentage Points of the Studentized Range. The values of the studentized 
range corresponding to certain cumulative probabilities P were computed, for 
samples of size n from a normal distribution with » degrees of freedom for the 
independent estimate s of the population standard deviation, by inverse inter- 
polation in a new table of the probability integral of the studentized range. The 
probability integral was computed at varying intervals, small enough to make 
the table interpolable, for n = 2 (1) 20 (2) 40 (40) 100 and » = 1 (1) 20, 24, 
30, 40, 60, and 120, accurate to within a unit in the sixth decimal place, except 
for values greater than 0.999995, which are given as 1.00000. The method of 
computation of the probability integral and the table itself have been given by 
Harter, Clemm, and Guthrie [7], and will not be included here. The percentage 
points were computed for the same values of n and » as the probability integral. 
Values for » = ©, obtained by rounding values from Table 1, are included for 
convenience in interpolation. Results have been tabulated by Harter, Clemm, 
and Guthrie [7] for cumulative probability 0.001, 0.005, 0.01, 0.025, 0.05, 0.1 
(0.1) 0.9, 0.95, 0.975, 0.99, 0.995, and 0.999, but values will be given here only 
for the highest six of these values of P. The interpolation necessary to obtain the 
percentage points was performed on the ERA 1103A computer, using an iterative 
procedure involving the following steps: 

1. In the table of the probability integral of the studentized range for the 





RANGE AND STUDENTIZED RANGE TABLES 1129 


desired values of n and », find the two successive probabilities, yo and y; , be- 
tween which the required cumulative probability P lies. Call the two correspond- 
ing arguments (studentized ranges) 2» and x, , respectively. The required stu- 
dentized range Q corresponding to cumulative probability P will lie between 
Xo and 2. 

2. Compute the tolerance T for P corresponding to a tolerance 5 x 10° for 
Q by means of the equation T = (AP/AQ) x 5 x 10°, where AP = y, — yo, 
AQ = x, — 2%, and u is the number of digits before the decimal point in numbers 
between 2 and 2; . 

3. Perform linear inverse interpolation to find an approximation z to the re- 
quired Q, using the relation 

z= (1 — m)(P — we) 
(y: — Yo) 

4. Perform direct interpolation, using Aitken’s method with a tolerance of 
5 x 10°’ and with provision for up to 16-point interpolation if the tolerance is 
not met for fewer points, to find the cumulative probability y corresponding to 
the value z of the studentized range. 

5. Compare the result y of step (4) with the required cumulative probability P, 
using the tolerance 7 computed in step (2): 

a. If|y — P| Ss T, stop and set Q = z. 

b. If (y — P) > T, replace y by y and z, by z, then repeat the procedure, 
starting with step (3). 

ce. If (y — P) < —T, replace yw by y and 2» by z, then repeat the procedure, 
starting with step (3). 

The tolerance for the direct interpolation was set at 5 x 10~’, so that the 
interpolation error would not add appreciably to the error already present in 
the new table of the probability integral of the studentized range, which is ac- 
curate to within a unit in the sixth decimal place, and hence the interpolated 
values are substantially as accurate as the values in the input table. Inverse 
interpolation is, of course, not as accurate as direct interpolation, the error being 
4Q/AP times as great for inverse interpolation as for direct interpolation. Thus 
the tolerance for P was found by multiplying the tolerance for Q(5 x 10°~*) by 
1/(AQ/AP) = AP/AQ. Since u is defined as the number of digits before the 
decimal point in the studentized range interval under consideration, this would 
guarantee that the error in Q would not exceed 5 units in the fifth significant digit 
for Q = 0.1, or in the fifth decimal place for Q < 1, if the ratio of the change 
in P to the change in Q were constant throughout the interval under consider- 
ation. This condition (P piecewise linear in Q) is obviously not satisfied in 
practice, but as long as the weaker condition 


(2) max [APo/ AQ, , AP;/4Q;) S 24P/AQ, 


where AP; = | y — y;| and AQ; = | z — z,| (¢ = 0, 1), is satisfied, the error in 
Q will not exceed a unit in the fourth significant digit for Q 2 0.1, or in the 





H. LEON HARTER 


TABLE 2 
Moments of the range for samples of n from N (yu, 1) 


‘ 2 
[eens | verteseed, [eeonses, nw Jetmpeae a 


1¢12837 072676 04553 7175 
1069256 ©78919 77107 7471 
2005875 077406 24738 5734 
2032592 074663 76009 1383 


2053441 071917 13092 7555 
2070435 069423 11313 5240 
2684720 067212 36717 5611 
2097002 065259 62151 3245 
3007750 063526 97762 8678 


3017287 061986 43117 9856 
3025845 060602 85277 0804 
3033598 059354 11244 6050 
3040676 058220 42445 2354 
3047182 057185 57265 4066 


3253198 056236 21426 3 0409 
3258788 055361 30572 3819 
3064006 054551 64487 R908 
3068896 053799 51043 1787 
3073495 053098 37904 9623 


3077833 052442 70274 0335 
3081936 051827 73314 2387 
3085832 051249 38181 464) 
3089534 050704 10861 6255 
3093062 050188 83188 6606 


3096431 049700 85564 5237 
3099653 049237 81028 1818 
4002741 048797 60384 6112 
4205704 048378 38184 7955 
4208552 047978 49392 7239 


4011292 047596 46599 3901 
4013933 047230 97683 7907 
4216481 046880 63838 $251 
418942 046544 97900 7944 
421321 046222 42928 4913 


4e23624 045912 30982 7496 
4e25855 045613 82080 8438 
428018 045326 23296 6890 
4030117 045048 87982 2909 
4632155 044781 15084 6552 


4034136 044522 48562 7881 
4e36063 044272 36864 6955 
4037938 044030 32479 3836 
4039764 043795 91537 B56 
4041543 04356 73457 1265 


443279 043348 40636 1932 
4e4497) 043134 58177 C645 
4046624 042926 93638 7462 
4048237 042725 16817 2438 
4e49814 042528 99556 5628 





RANGE AND STUDENTIZED RANGE TABLES 


TABLE 2 (Continued) 


ongation, “4:W 


4251356 042338 15568 
452863 042152 40275 
4o54338 041971 50671 
455782 041795 25197 
4e57196 041623 43619 


458581 041455 86931 
4259939 041292 37256 
4061270 041132 77766 
4262575 040976 92598 
4263855 040824 66789 


4265112 040675 86207 
4066345 240530 37495 
4067556 040388 08018 
468746 040248 85808 
4269915 040112 59526 


4071064 039979 18416 
4072194 039848 52266 
4073304 039720 51377 
4674397 039595 06527 
4275471 039472 08940 


4e76529 039351 50265 
4077570 039233 22541 
4078595 039117 18179 
4079604 039003 29946 
4.80598 0386891 50914 


4031577 038781 74497 
4082542 038673 94380 
4083493 038568 04527 
4084430 038463 99165 
4085354 038361 72763 


4.86266 ©38261 20024 
4087164 036162 35871 
488051 038065 15435 
4288925 037969 54044 
4.89788 037875 47214 


4290640 037782 90636 
4091481 037691 80172 
4092311 037602 11844 
4293130 037513 &1825 
4093940 037426 86432 


4294739 037341 22122 
4e95528 037256 85481 
4096308 037173 73222 
4297079 037091 62175 
497840 e37011 09284 


4.986593 036931 51602 
4099337 036853 06286 
5200072 036775 70590 
5200799 . 036699 41863 


2 6 





TABLE 3 


Percentage points of the studentized range for samples of n from N(u, o*) with » degrees of 
freedom for independent estimate s* of o? 
(Values of the studentized range Q corresponding to cumulative probability P) 
= 290 





RANGE AND STUDENTIZED RANGE TABLES 


TABLE 3 (Continued) 
P = 690 


Wet 20 J 22 7 24 26 |e 30 2 8 


1 
2 
3 
4 
5 
6 
7 
é 
9 
0 


50476 


50357 





H. LEON HARTER 


TABLE 3 (Continued) 
P= 095 





RANGE AND STUDENTIZED RANGE TABLES 


TABLE 3 (Continued) 
= 095 





H. LEON HARTER 


TABLE 3 (Continued) 
P= e975 





RANGE AND STUDENTIZED RANGE TABLES 


TABLE 3 (Continued) 
0975 


13709 
27047 
16046 
12079 
11200 
92938 
90239 
8743 
8e373 
82086 


72356 
70668 
7e512 
72379 
70265 
70167 
72081 
72005 
60936 
60876 


60685 
64497 
60311 
60127 
50946 

766 





H. LEON HARTER 


TABLE 3 (Continued) 
P= 499 





RANGE AND STUDENTIZED RANGE TABLES 


TABLE 3 (Continued) 
099 


46470 
2401? 
17046 
144639 
12065 
11252 
10275 
1017 
90726 


90377 
92094 
8e859 
8e661 
Be492 
Be 347 
8e21° 
Beld7 
8.008 
70919 


70642 
7¢370 
72104 
60843 
60582 
60328 





H. LEON HARTER 


TABLE 3 (Continued) 


P @ 6995 





RANGE AND STUDENTIZED RANGE TABLES 


TABLE 3 (Continued) 
P = .995 


ee EE ee 


1 
2 
3 
4 
5 
6 
7 
8 
9 
0 





H. LEON HARTER 


TABLE 3 (Continued) 


P = 2999 


ee ee ee 


1 
2 
3 
4 
5 
6 
7 
8 
9 
0 





RANGE AND STUDENTIZED RANGE TABLES 


TABLE 3 (Continued) 


P = 4999 





1144 H. LEON HARTER 


fourth decimal place for Q < 1. The condition (2) is in fact satisfied, and hence 
it can be stated that the error in the percentage points, when rounded to four 
significant digits or four decimal places, whichever is less accurate, does not 
exceed a unit in the last place. 

For v = 1 and n 2 6, the value of Q corresponding to cumulative probability 
P = 0.999 exceeds 2000. Since the new table of the probability integral of the 
studentized range extends only to Q = 2000, those percentage points which 
exceed 2000 cannot be found by interpolation. It has been shown, however, that 
for v = 1 and vy = 2 and for Q» sufficiently large, the cumulative probability 
P(Q, v, n) corresponding to the value Q for the studentized range of a normal 
sample of size n, with v degrees of freedom, can be approximated by 


(3) P(Q, »,n) = 1 — (Qo/Q)'"[L — P(Qo, », n)]. 


Hence the percentage points in question can be found by setting Qo = 2000 in 
equation (3) and solving for Q. 

The percentage points of the studentized range are given by Table 3, which 
was printed on the IBM 407 tabulator. 

In addition to the percentage points shown in Table 3, critical values for 
Duncan’s new multiple range test, which are percentage points of the studentized 
range for special protection levels based upon degrees of freedom, have been 
computed and will be published elsewhere (see Harter [5]). 


3. Interpolation in the Tables. 

3.1. Percentage Points of the Range. Table 1 gives the percentage points of the 
range for samples of size n = 2 (1) 20 (2) 40 (10) 100. One may wish to inter- 
polate for odd values of n between 20 and 40 and/or for values of n, not multiples 
of ten, between 40 and 100. Harmonic interpolation in n (interpolation using 1/n 
as the independent variable) is recommended. The maximum errors, in units of 
the sixth decimal place, are approximately as follows: 


Maximum Error 
T. of Harmonic 


nterpolation 


ai eel nani : 
Wc n< 0 <n < 80 | 80<n < 100 


Linear 

3-point 
4-point 
5-point 
6-point 
7-point 
8-point 


| couse | 


| 


Interpolation P-wise is not recommended. If percentage points are needed for 
values of P not included in Table 1, the best procedure is to interpolate, by the 
method outlined in Section 2.1, in the table of the probability integral of the 
range (see [6}). 





RANGE AND STUDENTIZED RANGE TABLES 1145 


3.2. Percentage Points of the Studentized Range. Table 3 gives the percentage 
points of the studentized range for samples of sizen = 2 (1) 20 (2) 40 (10) 100, 
with degrees of freedom » = 1 (1) 20, 24, 30, 40, 60, 120, @. One may wish to 
interpolate n-wise and/or »-wise. For n-wise interpolation, the maximum errors, 
in units of the fourth significant digit, are approximately as follows: 


Type of Interpolation 





Linear 
3-point 
4-point 


Linear harmonic »-wise interpolation* (linear interpolation for 1/v) is accurate 
to within 4 units in the fourth significant digit for P = .999, 2 units in the fourth 
significant digit for P = .995, .99, and 1 unit in the fourth significant digit for 
other values of P. Three-point harmonic »-wise interpolation is accurate to 
within 1 unit in the fourth significant digit for all values of P. For convenience 
in harmonic interpolation, values of » were chosen for inclusion in the table so 
as to form a harmonic series (20, 24, 30, 40, 60, 120, «). As in the case of the 
percentage points of the range, P-wise interpolation is not recommended. If per- 
centage points are needed for values of P not included in Table 3, the best pro- 
cedure is to interpolate, by the method outlined in section 2.3, in the table of the 
probability integral of the studentized range (see [7]). 


Acknowledgments. The author gratefully acknowledges the help given by the 
following persons: Dr. Gertrude Blanch, who rendered invaluable assistance in 
the numerical analysis: Mr. Donald 8. Clemm and Mr. Eugene Guthrie, who 
programmed most of the computations for the Univac Scientific computer; _ 
Major John V. Armitage, who suggested the iterative method of inverse inter- 
polation employed; Professors H. O. Hartley, J. W. Tukey and D. B. Duncan, 
the referees, and the Editor, who made helpful suggestions; and Professor Daniel 
Teichroew, who made available one of his unpublished tables. 


REFERENCES 

{1} H. A. Davin, “‘Further applications of range to the analysis of variance,’’ Biometrika, 
Vol. 38 (1951), pp. 393-407. 

[2] Davin B. Duncan, ‘Multiple range and multiple F tests,’’ Biometrics, Vol. 11 (1955), 
pp. 1-42. 

[3] Frank E. Groupes anp Cuatmers L. Weaver, “The best unbiased estimate of popu- 
lation standard deviation based on group ranges,’’ J. Amer. Stat. Assn., Vol. 42 
(1947), pp. 224-241. 


* This statement and the following one about »-wise interpolation were intended to apply 
to interpolation for integral values of » not included in the table. They apply also to frac- 
tional » for» > 20, but not to fractional » for small rv. 





1146 H. LEON HARTER 


(4) H. Leon Harter, “Error rates and sample sizes for range tests in multiple compari- 
sons,”’ Biometrics, Vol. 13 (1957), pp. 511-536. 

[5] H. Leon Harter, ‘‘Critical values for Duncan’s new multiple range test,’’ Biometrics, 
(to appear in December, 1960). 

(6) ? H. Legon Harter anp Donavp 8. Cuiemm, “The Probability Integrals of the Range 
and of the Studentized Range—Probability Integral, Percentage Points, and 
Moments of the Range,’’ Wright Air Development Center Technical Report 
58-484, Vol. I, 1959. (ASTIA Document No. AD215024) 

[7] >? H. Leon Harter, Donatp 8S. CLemm, anp Evoene H. Guruniz, ‘The Probability 
Integrals of the Range and of the Studentized Range—Probability Integral and 
Percentage Points of the Studentized Range; Critical Values for Duncan’s New 
Multiple Range Test,’’ Wright Air Development Center Technical Report 58-484, 
Vol. II, 1959. (ASTIA Document No. AD231733) 

{8} H. O. Hartiey, “The range in random samples,’’ Biometrika, Vol. 32 (1942), pp. 
334-348. 

{9} H. O. Hartiey, “‘Corrigenda (1) Tables of percentage points of the ‘studentized’ 
range,’ Biometrika, Vol. 40 (1953), p. 236. 

{10} H. O. Hartiey, ‘Use of range in analysis of variance,’’ Biometrika, Vol. 37 (1950), 
pp. 271-280. 

[11] M. Kevu.s, ‘‘The use of the ‘studentized range’ in connection with an analysis of vari- 
ance,”’ Euphytica, Vol. 1 (1952), pp. 112-122. 

[12] Jovce M. May, “Extended and corrected tables of the upper percentage points of 
the ‘studentized’ range,’’ Biometrika, Vol. 39 (1952), pp. 192-193. 

{13} A. T. McKay anp E. 8. Pearson, “‘A note on the distribution of range in samples of n,”’ 
Biometrika, Vol. 25 (1933), pp. 415-420. 

{14] D. Newman, ‘The distribution of range in samples from a normal population, ex- 
pressed in terms of an independent estimate of standard deviation,’’ Biometrika, 
Vol. 31 (1939), pp. 20-30. 

[15] P. B. Patnark, ‘“The use of mean range in statistical tests,’’ Biometrika, Vol. 37 (1950), 
pp. 78-87. 

{16] E. 8S. Pearson, “A further note on the distribution of range in samples taken from a 
normal population,’’ Biometrika, Vol. 18 (1926), pp. 173-194. 

{17} Econ S. Pearson, ‘“‘The percentage limits for the distribution of range in samples 
from a normal population (n S 100),’’ Biometrika, Vol. 24 (1932), pp. 404-417. 

{18} E. S. Pearson, The Application of Statistical Methods to Industrial Standardisation 
and Quality Control, British Standards Institution No. 600, London, 1935. 

[19] E. 8. Pearson, “ ‘Student’ as statistician,” Biometrika, Vol. 30 (1938), pp. 210-250. 

[20] E. 8S. Pearson anv H. O. Hart ey, ‘“‘The probability integral of the range in samples 
of n observations from a normal population,’’ Biometrika, Vol. 32 (1942), pp 
301-310. 

[21] E. S. Pearson anv H. O. Hart ey, ‘Tables of the probability integral of the ‘stu- 
dentised’ range,’’ Biometrika, Vol. 33 (1943), pp. 89-99. 

[22] J. W. Roperrs, ‘‘Associated statistical techniques,’’ Symposium on Statistical Quality 
Control, Ministry of Production, Birmingham, 1944. 

[23] W. A. Suewnart, Economic Control of Quality of Manufactured Product, VD. 
Van Nostrand Company, Inc., New York, 1931. 

(24) “‘Srupent,”’ ‘Errors of routine analysis,’’ Biometrika, Vol. 19 (1927), pp. 151-164. 

(25] D. Tercurorw, “Tables of expected values of order statistics and products of order 


* Available to the public from Office of Technical Services, U. S. Department of Com- 
merce, Washington 25, D. C. or to qualified requesters from Armed Services Technical 
Information Agency, Arlington Hall Station, Arlington 12, Virginia. 





RANGE AND STUDENTIZED RANGE TABLES 1147 


statistics for samples of size twenty and less from the normal distribution,” 
Ann. Math. Stat., Vol. 27 (1956), pp. 410-426. 

C. Tippett, ‘“‘On the extreme individuals and the range of samples taken from 
a normal population,’’ Biometrika, Vol. 17 (1925), pp. 364-387. 

Tuxey, ‘‘Allowances for various types of error rates,’’ unpublished invited ad- 
dress presented at Blacksburg meeting of Institute of Mathematical Statistics, 
1952. 

Tuxer, ‘The Problem of Multiple Comparisons,’’ unpublished memorandum 
in private circulation, 1953. 





ASYMPTOTIC FORMULAE FOR THE DISTRIBUTION OF 
HOTELLING’S GENERALIZED 7; STATISTIC. I 


By Korcu Iro 
Nanzan University, Nagoya, Japan 


1. Summary. In this paper an asymptotic formula for the cumulative distribu- 
tion function (c.d.f.) of Hotelling’s generalized 7% statistic is derived for general 
values of the number of dimensions in the non-null case. This result includes as a 
special case the previous result of the author for the null distribution of 7% 
[5], and shows certain properties of 7 which help determine the power of the 
test based on the statistic for moderately large samples. Both of the author’s 
results provide an approximate complete analysis of the 7} test, although the 
exact null and non-null distributions of 7 are not available at present. 


2. Introduction. Consider 


(2.1) Hy: t(p Xm) =O(p xX m) against H: § #0 
under the probability law 


{1/(2e)*?'"*”| = plate) exp (—4 tr > 
(2.2) 


{(X, — &)(Xi — &) + XoXo}] dXi dX, 


where X, and Xo are p X mand p X n matrices, respectively, in which p S m 
or >m, but p S n, and = is an unknown p X p symmetric, positive definite 
matrix. To test (2.1) Hotelling [3] proposed the statistic Tj = m tr S,Sy', where 
S, = X,Xi/m, Sy = XoXo/n, the prime denoting transposition of a matrix. The 
exact distribution of this statistic is known only when p = 2 and ¢ = 0 ({3}, 
p. 35), but for general values of p it is not available even in the null case. The 
author [5] gave asymptotic formulae for percentage points and for the c.d-f. of 
Ts for general values of p when & = 0, and Siotani [9] independently obtained 
similar results. In the following sections we shall derive an asymptotic expansion 
for the c.d.f. of T¢ for general values of p when ¢ ¥ 0, which covers the author’s 
previous result for = 0 as a special case. 

Let the non-zero characteristic roots of t’~' be denoted by \;, ---, Ag, 
where q is the rank of &’, gq S p, g S m, and 


(2.3) \= - Aj( = tr tS"). 
j= 


It is easily seen that \ 2 0, with equality attained if and only if Ho of (2.1) 
is true, and also that the distribution of T> when H, is not true involves no 
parameters other than p, m, n and the ,’s. Let the non-null c.d.f. of T be de- 


Received July 25, 1959; revised February 23, 1960. 
1148 





HOTELLING’S GENERALIZED 1) STATISTIC 
noted by 
(2.4) PriTi < 20| p, m,n; dr, --+  Agh = Fyn (20 |, °° * y Ae)- 
Now, generalizing (4.2) of [5], we have 


Pr {m tr S, So’ S 20| p, m,n; i, «+, Ag! 


(2.5) ; 
= [Pr {mtr SiSo' S 28| So; p,m, m; dr, -+> Rel Pr fdSo}, 
k 


where the first expression on the right denotes the conditional probability of the 
relation indicated for fixed values of the elements of S, , and the second denotes 
the probability elements of Sp which have a Wishart distribution with n degrees 
of freedom. The domain of integration FR is over all possible values of the ele- 
ments of Sp such that Sp» is positive semi-definite. Expansion of the former 
about the elements of = in a Taylor series and integration term by term yields 


(2.6) Fy.mn(20| 1, °°, Ag) = OPrimtr S,z~ Ss 26| mp; dj, 


where © is the asymptotic series with respect to n of the derivative operators 
given by (3.12) of [5], and the second expression on the right denotes the prob- 
ability of the relation indicated, where m tr S,2~ is distributed as a non-central 
x’ with mp degrees of freedom and deviation parameter \ of (2.3), which is 
known to be a function of 26, mp and X only, free of n [8]. (2.6) is one of what 
are conveniently known as “Studentization Formulae” [10]. When A = 0, (2.6) 
reduces to 


(2.7) Fy,mn(20|0,---,0) =O Primtr Sz < 26| mp; 0}, 


and the author made use of James’ method ({6), [7]) to express the right hand 
side of (2.7) in an asymptotic series, (4.3) of [5]. 

If we pursue the same approach when A > 0, however, the algebraic labor 
involved will be prohibitive for obtaining an actual expression of the asymptotic 
series on the right of (2.6). Such being the case, we shall first obtain an asymp- 
totic series of the characteristic function of the non-null 7% distribution for large 
values of n, which will then be inverted to derive an asymptotic expansion of the 
non-null c.d.f. of T%. 


3. Characteristic function of the non-null 7} distribution. Let the charac- 
teristic function of the non-null 7% distribution be denoted by 


(3.1) p.m nll iM, © Ag) == [ exp (itT?) dF y.m.n( To 1A, ore oll Ag). 


Hsu ([4], p. 232) gave an expression for the Laplace transform of a function of 
non-null 7%, by means of which (3.1) can be expressed in the form of a mul- 





1150 KOICHI ITO 


tiple integral as follows: 


bonal | Ai Delt Ag) 


(32) = (nx)? I] rig(n+1+m—r)j/Tii(n+1—r)} 
A leone 
x [ir o> 


am 
exp {it (tr A) + +/2ut : V¥2u} II II az. , 
j= rl em] 


where J is the p X p unit matrix and A(p X p) = X(p XK m)X'(m x p), 
the (r, s)-element of X being z,,. The domain of integration D is such that 
—2o <t%,< +0, r=1,---, p; 8 =1,---, m. Now making use of the 
generalized Stirling formula ([2], p. 130): 
(3.3) T'(y) = (2n)'y {1 + (1/12y) + O(y™)} exp (—-y) for y > 0, 
it is easily seen that 

n**r{h(n + 1 + m — r)}/T{¥(n + 1 — 1)} 
(3.4) 

= 271 + (4m? — 4mr)/n + O(n™)}. 


Hence the constant in front of the integral in (3.2) becomes 


as) (Ma LL Pian +1 + m — ))/PA(n +1 - 1) 
= (2x) "41 a mp(m —-p- 1)/4n + O(n™)}. 
On the other hand, 


—}(m+n) 
A 


I + = [1 + {(tr, A)’ — 2 tr, A — 2m tr, A} /4n + 0(n-’)] 


X exp (—4 tr: A), 


(3.6) 


n | 


where tr, Q stands for the sum of all k x k minors formed by the intersection 
of any k rows of Q with k columns bearing the same number, Q being a square 
matrix. tr,Q, which is the sum of all principal diagonal elements of Q, is tr Q. 
Hence (3.2) becomes 


dpmn(t| dr, ++, Ag) = (2x) {1 + mp(m — p — 1)/4n + O(n™)} 


(3.7) x fu + {(tr A)? —2tr, A — 2mtr A}/4n + O(n”)] 


i a 
x exp{ —4(1 — 2it) tr A + Vii VX; 255} IT II az... 
ook, ) tml ml 


Now, after some transformations ([4], p. 227), the probability law of 


Xi(p x m) 





HOTELLING’S GENERALIZED T) STATISTIC 
given in (2.2) becomes 
- P = 
(3.8) (2¢)7'" exp (—4A) exp (-4 trA + p> V¥21) I] I] arr, 
i= r=l o- 
where \ and \,’s are given in (2.3), and z,,’s and A are the same as used in (3.2). 
With respect to this probability law it is easily shown that 
E(1) = 
Elexp g(2,,.’s, it, a, +++ ,Ag)} 
= (1 — 2it)”” exp {—4.\ + [it A/(1 — 25t)]}, 
E{ (tr A)exp g(2n’s, it, un, -*+,Ag)} = (1 — 20) to 
: {mp + [2% d/(1 — 2it)]} exp [—4 A + [it /(1 — 248)}}, 
E{(tr A)* exp g(2,.’s, it, Xr, +++ ,Ag)} = (1 — 2) 0" 


{ 
=| mp(mp + 4) + (mp +2) A=. 5+ (253) 


unr 
x exp (- A+, tt) 


mane? exp g(2n’8, it, 1, -**,A_)} = (1 —2uty err 


2itr 
+} Emp (m — 1)(p — 1) + (m = 1)(p - ora 


+ (24, si) Sa} x exp(— tr wt kT 
where £ stands for the expectation with respect to (3.8), and 


9(Zre'B, tt, Ay, ++ Ag) = (tr A) + (V2 -— vy Vi04; - 
j= 
Substitution of (3.9) in (3.7) to carry out integration term by term yields 


Op.mn(t| rr, +++, Xg) = (1 = ait) | + {mp (m “>= 1) 


— 2m(mp + 2#% oi sya - 2it)? + [ mpi +p +1) 


+ (m+p+1) —>, ee + (24 5) & ai ]oa — 2a) / an 
j=l j 


+ oon) exp (= : 0) 


which is an asymptotic expansion of the characteristic function of the non-null 
T; distribution for large values of n. 





1152 KOICHI ITO 


4. Asymptotic expansion of the non-null c.d.f. of T) . Let F(z) be the c.df. of a 
statistic, and ¢(t) be its characteristic function. Then it is well known (e.g., 
see [10], p. 638) that if F and all its derivatives vanish at the extremes of the 
range of x and exist for all z in that range, then, by integration by parts, 

( —it)"¢(t) 


is the characteristic function of the rth derivative of F(x), or F(z), i.e., 


(4.1) / exp (itz) dF” (x) = (—it)' @ (t). 


Now if we denote the c.d.f. of a non-central x’ with 2p degrees of freedom and 
deviation parameter \ by F2,(20|), its rth derivative by F%))(2@|), and 
their characteristic functions by ¢2,(¢ | \) and ( —it)"¢e,(¢| A), respectively, then 
these functions satisfy the above conditions, and hence (3.10) can be written as 
(4.2) dpmm(t| Ar, °** Ag) = Hmp(t | A) + {mp(m — p — 1)dmp(t | A) 

— 2m’ pbmps2(t |) + mp(m + p + 1)>mpsa(t | d) 

+ 4mXd( —it dmpsa(t | A) — 40m + p + 1L)AC—tt)dmpse(t | A) 


q 
+ 4>° dj( —it)bmpis(t | A)}/4n + O(n). 


jai 
Inversion of (4.2) gives 
(4.3) Fop.mn(20| Ar, °° 5 Xe) = Fmp(20|X) + {mp(m — p — 1)Fmp(20| d) 
— 2m’ pF mps2(26 |) + mp(m + p + 1)Fmpya(26 | d) 
+ 4mdF 2) .4(20 |) — 4(m + p + 1)AF 23 46(20| A) 


(2) 


qg 
+ 4 > AS ,5(20| d)}/4n + O(n”). 
j=l 


Hence we have proved 

TuroreM: The non-null c.d.f. of the statistic T> = mtr S,So', where S, and 
So are subject to the probability law (2.2), has an asymptotic expansion (4.3) for 
large values of n. 

When A = 0, (4.3) becomes an asymptotic expansion of the null c.d-f. of 
73. 1a. 


(4.4) Fop.m.n(20|0,---,0) = Fp(20 | 0) 
+ mp|(m — p — 1)F mp(20|0) — 2mFP np42(28 | 0) 
+ (m + p + 1)Fmpsa(20| 0)}/4n + O(n), 


which is equivalent to (4.3) of [5]. In order to obtain several more terms in the 
above expansion, more systematic algebraic work would be necessary. 





2 
HOTELLING’S GENERALIZED To STATISTIC 1153 


5. Remarks. To test (2.1) under (2.2) there have been proposed for use many 
test statistics which are functions of the characteristic roots of mS,So', and 
Hotelling’s 7% statistic is one of them, being the sum of the characteristic roots. 
Unfortunately there is no theory at present to tell how to choose among different 
test statistics when we want power against certain kinds of alternatives ({1], 
p. 224), because far too little is known about the relative merits of these test 
statistics. In this situation the formulae (4.3) and (4.4) together with (3.33) 
of [5], which gives an asymptotic formula for the percentage point of the null 
T; distribution, provide an approximate complete analysis of the test of signifi- 
cance based on 7% for large values of n. (4.3) shows that the non-null c.d.f. of 
T is related exclusively to )-$., 4; and >-{_;<x A,Ax for large values of n, but 
it is conjectured that for small values of n the exact non-null c.d.f. of 7 involves 
all the symmetric functions of the \,’s. If results similar to (4.3) are obtained 
for other test statistics, they may throw some light to the question of choosing 
one to be used consistently in carrying out the actual tests of significance. 

The author wishes to express his indebtedness to Prof. W. Kruskal and the 
referee for their comments and suggestions which led to this revision of the 
original work. 


REFERENCES 

{1] T. W. AnpeRson, An Introduction to Multivariate Statistical Analysis, John Wiley & 
Sons, New York, 1958. 

{2} Harnotp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, N. J., 1951. 

[3] Haro_p Hore.urna, “A generalized T-test and measure of multivariate dispersion,”’ 
Proceedings of the Second Berkeley Symposium on Mathematical Statistics and 
Probability, 1951, pp. 23-41. 

[4] P. L. Hsu, “On generalized analysis of variance,’’ Biometrika, Vol. 31 (1940), pp. 221- 
237. 

[5] Korcur Iro, ‘‘Asymptotic formulae for the distribution of Hotelling’s generalized 
T? statistic,” Ann. Math. Stat., Vol. 27 (1956), pp. 1091-1105. 

[6] G. 8. James, ‘The comparison of several groups of observations when the ratios of 
the population variances are unknown,”’ Biometrika, Vol. 38 (1951), pp. 324-329. 

(7) G. 8. James, ‘Tests of linear hypotheses in univariate and multivariate analysis when 
the ratios of the population variances are unknown,’’ Biometrika, Vol. 41 (1954), 
pp. 19-43. 

[8] P. B. Parnark, “The non-central x*- and F-distributions and their applications,”’ 
Biometrika, Vol. 36 (1949), pp. 202-232. 

[9] Mrnorv Sroranr, “On the distribution of the Hotelling’s 7?-statistics,’’ Ann. Inst. 
Stat. Math., Vol. 8 (1956), pp. 1-14. 

{10} Davip L. Waxtace, “Asymptotic approximations to distributions,’’ Ann. Math. Stat., 
Vol. 29 (1958), pp. 635-654. 





ON UNBIASED ESTIMATION’ 


By L. Scumerrerer’ 
University of California, Berkeley 


The theory of unbiased estimation has been mainly developed for quad- 
ratic loss-functions. The purpose of the present paper is to generalize this 
theory to convex loss-functions, and especially to loss-functions which are pth 
powers (p 2 1). The treatment of these cases needs in part quite different tools 
than in the quadratic case. Theorems of Stein and Bahadur are generalized. The 
contents of the paper have, however, some relations to results previously obtained 
by Barankin. 

Let (R, S) be a measurable space and let $ be a nonempty class of probability 
measures P on S. Let g be any real valued function from $ into euclidean R, . 
A real-valued measurable function on R for which fz hdP exists for all P ¢ $ is 
called an unbiased estimator for g if 


(1) E(h; P) = | nap = 9(P), 


for all P e §. 
The set of all h’s which satisfy (1) will be designated by H, . Let w(z) be any 
nonnegative Borel-measurable function defined on —*x < z < &. Denote by 


H,(w; P) the set of all h ¢ H, for which E(w(h — g(P)); P) with P e § exists. 
DEFINITION 1: ho ¢€ H,(w; Po) is called locally w-minimal a Po « § if 


E(w(ho —_ g( Po) );) Pe) s E(w(h - g(Po)); Po) 
for all h ¢ H,(w; Po). 
DeFINITION 2: ho ¢€ e.g H,(w; P) is called uniformly w-minimal if 


E(w(ho — g(P)); P) Ss E(w(h — g(P)); P) 


for all hh ¢ Np.g H,(w; P) and every P « §. 

If w(z) is of the form | z |”, p 2 1, then we shall also use the phrase p-minimal 
instead of w-minimal. The significance of H,(p; P) is obvious. 

The case w(z) = 2 is frequently treated in the literature. Only a few papers 
exist which are occupied with more general loss functions w(z). I refer in this 
connection to investigations by Barankin [1]. 

We now give 

DeFIniTion 3. Let V,(p 2 1) be the class of all unbiased estimators v for 
g = Osuch that E(\v |”; P) exists forall P ¢ 8, and let V“,’ be the class of all un- 

Received April 17, 1959; revised June 20, 1960. 

1 Research done at the Adolf C. and Mary Sprague Miller Institute for Basic Research 
in Science, Berkeley, California. 

2? Now at the University of Hamburg. 





ON UNBIASED ESTIMATION 1155 


biased estimators v for g = 0 such that E(| v |”; Po), Po ¢ B, exists. The class of 
all measurable functions h which satisfy E(\h |’; P) < © for all P ¢ $ will be 
denoted by £,. 

For any p 2 1 and any measurable function A on R we will write || h ||,.» for 
(fe | h\?dP)“”. The Banach space of all functions A with finite norm || h ||,» will 
be denoted by Li. 

In [2], the following theorem was proved by the author. 

Tueorem 1. ho ¢ N pepH,(p; P) is uniformly p-minimal (p > 1) if and only if 
the Fréchet-differential, dL(hy — g(P); v), of the norm || ho — g(P))|\».2 vanishes 
for allv ¢ V, and each P ¢ &. 

Clearly, a similar theorem is valid for unbiased estimators which are locally 
p-minimal at P, replacing V, by V’,. 

Moreover, I will make use of the following theorem [2], [3, p. 63]. 

THEOREM 2. Jf w(z) is strictly convex, then there exists at most one unbiased 
estimator which is locally or uniformly w-minimal. 

Remark. Clearly, the exact meaning of Theorem 2 is the following: If he ¢ 
H,(w; Po) is locally w-minimal in Py, then, for any other locally w-minimal 
h ¢ H,(w; Po), we have Po({h # ho} ) = 0, and, if ho e N >-gH,(w; P) is uniformly 
w-minimal, then, for all P e 8, we have P({h # ho} ) = 0 for any other uniformly 
w-minimal h ¢ f1\>.sH,(w; P). Similar remarks apply to analogous cases. We 
shall now prove 

THEoreM 3. Let B be dominated by a probability measure yp with u e YB. The gen- 
eralized density dP/dy of P ¢ B will be denoted by fe . Suppose that fp e Li(q > 1) 
for all P « B. Let G be the set of cil real-valued functions g, on & of the form P — 
E(k; P) withk ¢ L, and 1/p+1/q = 1. he H,(p; uw) with g ¢ G is locally p- 
minimal at uw if and only if there exists a mapping T defined on G into the real 
numbers such that 


(2) Tig.) = [ k\h — glu) |"* sgn (h — glu) dy 
R 


for allk ¢ Ly . The value of the minimum is given by T(g — g(u)). 

The proof is based on two lemmas. 

Lemma 1. Let B be a Banach space. Denote its norm by || - ||. Let B* be the con- 
jugate space of B and let M° C B* be the annihilator of M, where M is a closed 
linear manifold of B. Let Q = B/M be the quotient space of B and M and let ¢ be 
the canonical mapping of B onto Q. Introducing the norm 

y\| = inf |izii, 
o(zamy 
Q also becomes a Banach space. Let Q* be the conjugate space of Q. The mapping 
¢*, the transformation adjoint to ¢, is a one-to-one linear and isumetric mapping of 
Q* onto M° {4, p. 115}. 

Lemma 2. V%, is a closed linear manifold of L*, . 

Proor. It is clear that V4 is a linear manifold. Moreover V%, is closed in L} 
because strong convergence in L’, implies weak convergence. 





1156 L. SCHMETTERER 


Proor or THeorem 3. First, let 7 be a mapping of G into the set of real num- 
bers, which satisfies (2) for some h ¢ H,(p; u). Choose B = LS and M = V4. 
There is a one-to-one correspondence between G and the set of all classes H,,(p;u) 
(with k « L4,). Thus, there is a one-to-one correspondence between G and Q = 
L’,/V*, . Let us now consider T as a functional on Q. Clearly, T must be linear 
and bounded. Now an application of Lemma 1 shows that necessarily 


(3) | o\h —g(u) (P* sign (h — g(u)) du = 0 


for all v e¢ V5. But Theorem 1 implies that h is locally p-minimal in yu. On the 
other hand, if h ¢ H,(p; «) is locally p-minimal we again have (3) for all v e V4 
according to Theorem 1. Denote the linear functional defined by 


[1m — ou) |" sign (h — g(u)) de 


for all k « L4 by L. We can define 7 by ¢**(L). Clearly, 


T(9 - o(u)) = J. | hk — o(u) |? de 


For the case p = 2, Theorem 3 has been proved by Stein [5] by a different 
method. 

Next we give 

Derinition 5. Let p > 1 and 1/p + 1/q = 1. We define a transformation 
N of Li to L’, by f > |f |" sgn f for all f ¢ L4.. If f runs through a subset 
C Cc L‘, we write for the set of all Nf with f e C simply NC. Clearly, for all 
k ¢ Lt , Nk exists and is given by | k |"? sgn k. 

It is not difficult to find applications of Theorem 3 which are generalizations 
of corresponding applications by Stein. This leads, e.g., to 

TueroreM 3’. Let there be given a o-algebra S of subsets of B, let there be given a 
o-finite totally additive (in general) signed measure m over (Y%, GS), and suppose that 
Sr satisfies the conditions of Theorem 3. Suppose further that fp , considered as a 
function on R X J, is measurable. If fe| k | [eft d| m | du exists for all k ¢ L> 
and if E(N™ Jefe dm; u) = 0, then N Je fp dm is locally p-minimal at u. 

Proor. Denote the mapping P — E(k; P) with k « L4, by m& . It is enough to 
observe that 


T(g:) = [I kf dm dp 


for all k e L‘, exists and satisfies the conditions of Theorem 3. 

We will illustrate this theorem for the case p = 3 by a simple example which 
however is general enough to serve as a pattern for the general finite dimensional 
case. 


EXAMPLE 1: Suppose that R = {2x , 22, 2s, 2%} is a finite set and S the set of 





ON UNBIASED ESTIMATION 


all subsets of R. Define 
P(x) = i ’ ] 4, 


P,(z:) = p= 1,3; Px) =B>0, i=2,4,a+68 =} 
p(xz;) = } = 1 3; p(x) = a, t = 2,4 


Let S be the set of all subsets of B = {P;, P2, wu} and define the measure m by: 
m(P,) = 0,m(P:) = d\2., m(u) = As, where A, and A, are any real numbers. 
Obviously P; and P, are dominated by » and we have 


Sr, (2s) = a;/B, i= 1, 3; Sr,(2) = a;/a, {= 2, 4 
Sr,(2i) = a/B, t = 1, 3; Sr,( 2s) = B/a, i = 2, 4 


We will now determine unbiased estimators which are locally 3-minimal at yg. 
We have: fefr(z;) dm = (a/B)d\2 + As, i = 1,3, fe fr(ai)dm = (B/a)rr + ds, 
t= 2, 4. 
According to Theorem 3’ we have to determine ), and ), in such a manner that 


B | (a/B) re + As (* sgn (a/8)d2 + As) 
+ a |(B/a)dr2 + As |* sgn ((B/a)re + rs) = 0 


It follows by a simple calculation that, if y is any real number and if g is a 
function over $ defined by 


g(P:) = (\y| |a® — # |)? + a?) "((a, + as)(a/8) sgn (y(a® — 6°) 
+ (a; + a4)(8/a) sgn (y(6° — a’)) 
g(P2) = (\y| ja — 6° |)*(B + a) *((a?/B) sgn (y(a® — 8°) 
+ (8°/a) sgn (y(8 — a’)) 
g(u) = 0, 
then 


h(z:) = ({y|| a’ — B |)'a/B(8" + a*) sgn (y(a’ — 6’)), i = 1,3, 
(as) = ({y| |S — a? |)'B/a(B + a”) sgn (y(6* — a’)), i= 2,4 


is the unbiased estimator for the function g which is locally 3-minimal at yu. 

Clearly, if we had taken m(P,) = \, # 0, then we would have obtained a 
two-parametric class of unbiased estimators which are locally 3-minimal at u 
for a corresponding two-parametric class of functions g which vanish at yu. 
Hence it is possible to determine the locally 3-minimal unbiased estimator for 
every function g on $ that vanishes at uw by solving an algebraic equation for 
1, Ae, As, Which is at most of the second degree (Cf. also example 2). 

Let G have the same significance as in Theorem 3 and let Gp be the subset of 
all functions g ¢ G with g(u) = 0. We denote the set of all unbiased estimators 





1158 L. SCHMETTERER 


for g € Go which are locally p-minimal at » by 7%, and the corresponding set for 
allg eGby T5. 

We now prove 

TueoremM 4. Suppose that & satisfies the conditions of Theorem 3. To can be 
mapped by a one-to-one transformation onto a subset W of a closed linear manifold 
U c& Li where U is the closed linear manifold spanned by all fp and W is the set 
of all k ¢ U with E(N'k; ») = 0. UN NH,(p; ») contains for each g ¢ Go exactly 
one element k, and N~'k, is an unbiased estimator for g and locally p-minimal at u. 

Of course, this theorem is strongly related to Theorem 3. First we formulate a 
theorem of Barankin [1] as 

Lemma 3. Suppose that % is dominated by uw with up e B. Suppose further that 
fp e L4(1 Sq < @) forall P ¢ &. Then there exists for each nonempty class 
H,(p; ») at least one unbiased estimator which is locally p-minimal at yu, where 
1/p + 1/q = 1. 

PROOF OF THE THEOREM. Let k ¢ U and so 


(3’) E(N“k; u) = 0. 
According to Definition 3 we have for each v ¢ V*, and all f; 
(4) [ vfp du = 0. 

R 


If 


(5) k= dL afr, 


for any natural n, any real number a; and P; ¢ 8,7 = 1, --- , n, then (4) implies 
(6) I vk du = 0 
R 


for all v e V4. If & satisfies condition (3’), then an application of Theorem 1 
shows that N~’k is locally p-minimal at y. 

If || kn — k || ¢~ — 0, where the k, are of the form (5), and if & fulfills (3’), 
then k also satisfies (6) for all v e V%, . This implies that N ‘sk is locally p-minimal 
at yp. 

Now we have to show that UN NH,(p; u) is not empty for every g ¢ Gy. An 
application of Theorem 2 entails that this intersection contains at most one 
element. There exists, according to Lemma 3, an element Ah ¢ H,(p; u) which is 
locally p-minimal at u. Moreover, Barankin has proved the existence of a se- 
quence k,, ¢ U, such that 


[ knhdu—\\h\\2, and || kn leu || NA llew 
R 


It follows, using a theorem of Radon [6], that || k, — Nh ||,,,— 0. 
Coro.uary. For p = 2, U and T? are identical {7}. 





ON UNBIASED ESTIMATION 1159 


This follows from the fact that N is the identity for p = 2. 

Let us denote by 7} the set of all estimators which are p-minimal at P ¢ 
for some real-valued mapping on $. 

Tueorem 5. Let $ be any (not necessarily dominated) set of probablity measures 
defined on S and suppose P ¢ B. If h e T), then, for any constant d, h + d and Mh 
are also in T5. T? is in general not linear. 

Proor. The first positive part of the theorem is a trivial application of Theo- 
rem 1. Further, it is almost obvious that 7} for p # 2 is not linear. We consider 
a simple example. 

EXAMPLE 2: Let t, , te , tp(0 < t; < 1) be a set of three real numbers including 
}. Let a, , --- , ay be any different real numbers. Let 8 = (P:, , Pi, , P:,) be 
given by P.,(a,) = (1 — t;)®, Pu(ae) = ts — G/2, Pu (as) = te — G, Pula) = 
ti/2 and P,,(M) = 0 for each set M of real numbers which does not contain at 
least one of the numbers a,,---,@.~— - 

Consider the two functionals on 8, g:(t;) = t;, go(ts) = GO, i = 1, 2, 3. It is 
easy to see that the set H,, consists of the following functions: 


h?(a,) = 0,h (az) = 1 — 2, h(a) = 2, h(a) = 14 2, 
—-2 <zr< @, 


For the determination of the unbiased estimator hj” which is locally 3-minimal 
at P, one obtains the equation 


8(4 — x)’ sgn(z — 4) +3(2 — 4)’ sgn(z — 4) + 4(4 + 2)’ sgn(4 +z) = 0 


and hs” is determined by the solution z{” = (3 — (5)*)/4. The set H,, consists 
of the following functions 


h® (a,) = 0,h? (a2) = —2, h(a.) = 2, h(a) = 24+2, —e << ow, 
For the determination of hi” we must consider the equation 
a(x + 4)’ sgn(z + 4) + A(x — 4)%sgn(z — 4) +4(F +4)’ sen (Et + 2) =0 


and the relevant solution of this equation is given by z{” = (3 — (53)*)/8. 
Finally, let gs(t;) = gilts) + go(ts), i = 1, 2, 3. The set H,, consists of the 


functions 
h? (a) = 0, h(a.) = 1 — y, hh (ay) = y, A? (ay) = 3 + y, 


- 


—-s <y< @, 
The solution yo = (9 — (141 )')/8 of the equation 


2(4 — y)* sen(y — 4) + 2(y — 9)? sgn(y — 3) +4(4 + y)? sgn( + y)= 0 
determines hs” . Clearly, hj” + hi? # hg” . 
Turorem 6. Suppose that & satisfies the conditions of Theorem 3. Then T%, is 


closed in L*, . 
We need the following 





1160 L. SCHMETTERER 


Lema 4. Let f, and f, be in Lt. We have the inequality 
(7) PIN -NfltaP SCCM, | fill nr, lll nn) Ui — Sollee 


where p > 1 and 1/p + 1/q = 1 and C(9, |! fi |lp.e, || felipe) = 
2°**D( ll fa llee + |i Se llpr)”®. 

Proor. For r 2 1 and any real numbers y, z the following inequalities are 
valid: 

(8) ly—2\' S2°|\|y|' seny — |z|" sgnz| 

and 

(9) tly |" sgn y — |z|'sgnz| S 2r|y—2z|(ly|+ le2|)™. 
(For a proof, compare ([8], p. 221)). 

We use first (8) for y = | fi |” sgn fi, z = | fo |” sgn fe and r = q and then 

(9) for y = fi, z = fg and r = p and so obtain 

| fi" sen fir — | fo |?" sgn fel S 2°" pi fi— fel (Al+ihl)”” 
(up to sets of P-measure 0 of course). Integrating, and applying Hélder’s and 
Minkowski’s inequalities, gives (7). 

PRooF OF THE THEOREM. Let h, ¢ 7% and || h, — h |p, — 0 for some h ¢ L4. 
We have g,(P) = E(h,; P) ~ E(h; P) = g(P) forall P ¢ B because fp ¢ Lt. 
It follows that 

| h, — gn(u) — (h — g(#)) |p — 0. 
The inequality (7) of Lemma 4 implies 
(10) | N(An — gn(u)) — N(h — g(u)) lle > 0. 


Now 


[o | he — ga(u) ("* agn (he — ga(n)) du = 0 


for allv ¢ V5 and n = 1, 2, --- . Therefore, (10) implies 


[ooin — g(u) |” sgn (h — g(u)) du = 0 


for allv e V4. 

It is well known that there is a strong relation between the concepts of suffi- 
ciency and of uniform w-minimality. In this connection the following definition 
[9] is important. 

Dertinirtion 6. A subalgebra So of S is called p-complete if zero is the unique 
So-measurable element of V, . 

There is the following important result [10], [11], [9] which we formulate as 

Lemma 5. If there exists a sufficient and p-complete subalgebra So of S for B, then 





ON UNBIASED ESTIMATION 1161 


an So-measurable uniformly p-minimal estimator exists for eachg if NrewH,(w; P) 
is non empty. 

For the case p = 2 Bahadur [7] gave an interesting inverse theorem. It seems 
that such a theorem does not exist in the more general cases. But by modifying 
Bahadur’s ideas it is possible to give the following 

Tueorem 7. Let $ be any class of probability measures. Consider the set C, of all 
characteristic functions of sels in S, which are uniformly p-minimal (p > 1). 
Denote by So the smallest subalgebra of S, such that all functions of C, are So- 
measurable. Then So is a p-complete subfield and all So-measurable functions E, 
(Definition 3) are uniformly p-minimal estimators. 

Proor. Let A C R be any set. We denote by c, the characteristic function of 
this set. Consider now a set A ¢ S and suppose that c, ¢ C,. Then we have for 
all v ¢e V, andeach P fgvN(c, — P(A)) dP = 0. Suppose 0 < P(A) < 1. It 
follows that 


| ol(1 — P(A))""* + (P(A))""] dP = 0 


for all v ¢ V, and each P ¢ $. This means {4 v dP = 0 for all v ¢ V, and each 
P ¢§, or 


(11) | ve. aP =. 
R 


Obviously, (11) holds also for the cases P(A) = 0 and P(A) = 1. We have 
0 S c,4 S 1 and so by (11) we, ¢ V, for allv ¢e V,. Now, if Be S is « different 
set with cz e C, , we have instead of (11) fgvcedP = 0 forall vy ¢ V, and each 
P ¢ &. It follows that 


(12) [ vee c, dP = [ ve,ne dP = 0 
K Ak 


for all vy ¢e V, and each P ¢ $. Suppose 0 < P(A N B) < 1. Consider 
[ oN (eane — P(ANB)) dp 


for av e V,and a Pe. (12) implies that this integral vanishes and socans ¢C, . 
Moreover, if c, ¢ C, it follows that 1 — c, = cg_, ¢ C, by using Theorem 5. 
Finally, consider a denumerable class of sets A; ¢ S, which are pairwise disjoint 

and so that c,, ¢ C,. Denote UZ, A; by A. We have c, = >-7. c4, and so for 

all Pe B || Soh ca, — ca || pe — 0. Theorem 6 gives c, ¢ C,. Thus, we have 

proved that the class of all sets A ¢ S for which c, belongs to C, , forms a o- 

algebra and obviously this must be S». It is easy to show that ayc,, + osc,, for 

any real numbers a, and c,, ¢ C, is a uniformly p-minimal estimator. Let h e E, 

and let h be So-measurable. Then there exists always a sequence of functions of 

the form >-*2; aa, , a; real numbers, c,, ¢ C,, which converge to h, and such 





1162 L. SCHMETTERER 


that || Soi, acs, — h ||p,r > 0 for very P ¢ &. It follows that h is uniformly 
p-minimal. 

If h ¢ E, is So-measurable and an unbiased estimator for g = 0, h must be 
uniformly p-minimal and so equal to zero. Moreover it is easy to show that 
So is necessary. (Cf. [7].) 

Concerning sufficiency it is possible to show 

TueroremM 8. Let $ be a dominated class of probability measures, u a measure 
equivalent to B, and » ¢ J. Suppose that $ is a convex set. Consider the set Ty» 
of all bounded uniformly p-minimal estimators, and denote by S° the smallest sub- 
algebra of S such that all elements of T,» are S°’-measurable. We assume further: 
If a real-valued function g on B has a bounded unbiased estimator then it has also a 
uniformly p-minimal unbiased estimator. Then S° is sufficient for 8. 

Proor. We remark that the existence of a measure » which is equivalent to $ 
can be proved in the dominated case [12]. Let P; , P, ¢ B and P; # P,. The 
measure 


A= Py + anPe + am, af > 0, a + a2 + a; = 1 
is equivalent to u and so to $ and moreover, \ ¢ $. We have 
1 = a(dPi/dX) + an(dP2/drd) + as(du/dr). 


It follows that dP;/d\ = fp, ,i = 1, 2 is bounded. Consider E(fp,; P) = gi(P) 
for all P « $. By the boundedness of fp, there exist uniformly p-minimal un- 
biased estimators h, for g; . 

Let V be the class of all unbiased estimators v for the zero-functional on $. 
We have 


(13) [ vfr, dd = 0 


for all v ¢ V and so forallv eV, . 

If E(N~ ‘fr, ; \) = 0 then according to Theorem 4, N~'fp, must be locally 
p-minimal at ) for g; . But \ is equivalent to 8 and moreover N~'fp, ¢ E, . Thus, 
we must have N'fp, = h; according to Theorem 2. Since N~ ‘fp, is bounded, 
we have h; ¢€ 7,». Therefore, fp, is S’-measurable. However, in general 
E(N~'fe,; +) #¥ 0. 

Let y be any real number. By (13) we also have 


/ o(fe, + 7) dh =O forallve V. 
Rg 


Consider 


[ \ fe, + 7 |"? sgn (fe, + vy) ad. 


It is easy to show that this integral is a continuous function n of y for —= < 





ON UNBIASED ESTIMATION 1163 


y < @ by using Lemma 4. If y > 0 is large enough, 9(7) must be >0, because 
fr, is bounded. If y < 0 and | y | is large enough, (7) is < 0. Hence, there is at 
least one y = yo with n(yo) = 0. 

We have to repeat the previous argument with fp, replaced by (fr, + vo). 
We obtain again the result that fp, is S’-measurable. Thus we have proved that 
S° is pairwise sufficient for $. This involves sufficiency for the dominated 
case [12]. 


REFERENCES 


[1] E. W. Baranaxin, ‘‘Locally best unbiased estimates,’’ Ann. Math. Stat., Vol. 20 (1949), 
pp. 477-02. 
[2] L. Scumerrerer, “Bemerkungen zur Theorie der erwartungstreuen Schitzungen,”’ 
Mitteil.-Bl. math. Statistik, Vol. 9 (1957), pp. 147-152. 
[3] D. A. 8S. Fraser, Nonparametric Methods in Statistics, John Wiley and Sons, New York, 
1957. 
[4] N. Bourspaxt, Livre V, Espaces Vectoriels Topologiques, Chapitre III-V, Hermann et 
Cie, Paris, 1955. 
[5] Cuartes Stein, “Unbiased estimates with minimum variances,’’ Ann. Math. Stat., 
Vol. 21 (1950), pp. 406-415. 
{6} J. Rapon, ‘‘Theorie und Anwendungen der absolut additiven Mengenfunktionen,”’ 
Sitzungsberichte der math. naturwiss. Klasse der Akad. der Wiss, Wien CXII, 
Abt. Ila (1913), pp. 1295-1438. 
{7} R. R. Banapur, “On unbiased estimates of uniformly minimum variance,” Sankhyd, 
Vol. 18 (1957), pp. 211-224. 
{8} N. Boursaxi, Livre VI, Integration, Chapitre I-IV (1952), Hermann et Cie, Paris. 
{9} E. L. Leumann anv H. Scuerré, “Completeness, similar regions and unbiased estima- 
tion,’’ Sankhyd, Vol. 10 (1950), pp. 305-339. 
(10) C. Brackwe t, ‘Conditional expectation and unbiased sequential estimation,’ Ann. 
Math. Stat., Vol. 18 (1947), pp. 105-110. 
{11] E. W. Baranxin, “Extension of a theorem of Blackwell,’’ Ann. Math. Stat., Vol. 21 
(1950), pp. 280-284. 
{12} R. P. Hatmos anv L. J. Savace, “Application of the Radon-Nikodym theorem to the 
theory of sufficient statistics,’’ Ann. Math. Stat., Vol. 21 (1949), pp. 225-241. 





TWO-STAGE EXPERIMENTS FOR ESTIMATING A COMMON 
MEAN' 


By Donautp RIcHTER 
University of North Carolina* 


Summary. Let 7; , 7: be two normal populations with common mean yu and 
variances oj , 0; , where the parameter values are unknown. Suppose that it is 
desired to estimate yu, and that the experimental procedure is to take m obser- 
vations from each population, compute variance estimates, and then take 
n — 2m observations from that population with the smaller observed variance, 
where n has been fixed beforehand. Let R,(0,m) = Vo'E(é — pu)’ be the risk 
of the estimator 4, where Vo = n™ min (1, 02) and where 0 = o3/o1. Fora 
class of “best” estimators, it is shown in this paper that sups,(@, m) — 1 as 
n— oo if and only if m/n — 0 and m— « asn— ~; that min,, supeR,(0, m) ~ 
1 + Cn“ as n— «; and that the minimax sample size is m ~ Cn! as n> @. 


1. Introduction. This investigation treats the problem of estimating the com- 
mon mean p of two populations using a fixed number n of observations. If the 
population variances were known, the most efficient procedure would be to 
take all n observations from that population with the smaller variance. When 
prior information about the variances is lacking or is too vague to be quantified, 
it is natural to consider the procedure which consists of taking a preliminary 
sample of size m from each population, computing estimates of the variances, 
and then taking the remaining n — 2m observations from that population with 
the apparently smaller variance. Since, if m is chosen too large or too small, the 
advantage of the two-stage sampling scheme over the procedure of simply tak- 
ing n/2 observations from each population will be lost, the problem arises of 
determining for some good estimator an optimum choice of m as a function of n, 
not dependent on the unknown variances. 

As an example, we may suppose that we have available two devices for meas- 
uring a physical constant, that each measurement is expensive or time consum- 
ing so that their total number is limited, and that we wish to estimate the con- 
stant as accurately as possible. 

For related work on two-stage experiments with a fixed total sample size, 
reference should be made to Ghurye and Robbins [2], where it is shown that the 
ratio of the variance of a certain two-stage estimator for the difference of two 
means to the minimum variance tends to unity as the sample size increases, and 


Received August 3, 1959; revised July 25, 1960. 

1 This research was supported by the United States Air Force through the Air Force 
Office of Scientific Research of the Air Research and Development Command, under Con- 
tract No. AF 49(638)-261. Reproduction in whole or in part is permitted for any purpose of 
the United States Government. 


? Present address: University of Minnesota, Minneapolis. 


1164 





ESTIMATING A COMMON MEAN 1165 


to Putter [3], where an analogous result, among others, is obtained for a double 
sampling rule for estimating the mean of a stratified population. In neither paper 
is any indication given as to how the first-stage sample size might be chosen as 
a function of the total sample size. For an introduction to the problems of se- 
quential experimentation, see Robbins [4]. 

In Section 2 the problem will be formulated explicitly, and a suitable risk 
function will be defined. In the remaining Sections, where the populations are 
assumed normal, necessary and sufficient conditions for the risk to converge to 
unity will be obtained, and the asymptotic minimax value of m will be derived; 
these results will be seen to be, in a certain sense, estimator-free. 


2. Formulation of the problem. In this section, the problem will be formulated 
in an explicit and convenient manner. 


Let X,, X., Xs, --- and Y,, Y2, Ys, --- be mutually independent random 
variables with common mean » and Var X; = o} and Var Y; = o3; write @ = 


o3/o1 . Let 
et ee (EX) / 


“8 En (En)/= 


so that 1/R is the usual estimator of @ based on 2m observations. Then our pro- 
cedure is first to observe X,, X2,---, X», Yi, Y2,--*, Yu, and secondly 


to observe Xnui,°°:, Xn-mif R < 1, Yous, +++, Ya—m otherwise. 

Writing Xy, = SoY' Xi/Ni and Vy, = S°Y* ¥:/N2, we will consider esti- 
mators 4 of » which can be written in the form 
(1) i = AXw, + BY»,, 


where N,, N:, A, B are random variables such that N; = n — m if R < 1, 
N,=miffR21,Ni+N2 =n, ,0SASLOVOSBSilandA+Be-1 
with probability one; and, in addition, where A and B are such that 


(1’) EX, = EX, , ExY, = EY, EuX.7 1 = EuX,-En¥, ’ 
E,Xi = EXi, ExY? = EY? for all k, 1, 


where Ey(-) = E(-|H) and H = (A, B, N,, Nz). If the X, and the Y; are 
assumed to be normally distributed, then, recalling that the sample mean and 
variance are independent for normal populations, Assumption (1’) may be re- 
placed by the assumption that A and B are functions of the sample variances 
only. Restriction to estimators of form (1) seems reasonable since, if observa- 
tions are available on X,, --- , Xn,, Yi,°-** , Yna,, the variables are normal, 
and @ is known, aX,, + bY,, is the uniformly minimum variance unbiased 
estimator of u, where a = 1,6/(n,0 + m),a + 6 = 1. 

Next, let Vo = (1/n) min (oj, ¢3), which is the variance of the standard 
estimator of « for the case when sgn (¢} — 3) is known beforehand, and define 





1166 DONALD RICHTER 


R,(0,m) = Vo'E(i — yw)’ to be the risk function associated with the estimator 
A. That R,(@,m) = V5" Var fi follows from the first part of the following theorem. 
TueoreM 1: For any estimator of form (1), 


(a) Ei =p, 
(b) R,(6, m) = nmax (1, 1/0)E{A?/N, + 0B’/N}}, 
(c) R,(0, m) = nmax (1, @)E{1/(N2 + Ni@)} 2 1. 


Proor. Since Euj = AEgkXy, + BEuY vy, = yp, Ei = pg. Next, Euli —_ p)’ = 
Eg(AXw, + BYy, — u)* = A’oi/N, + B’o/N:2 which proves (b) since R,.(, m) 
=Vs'E{Ex(fi — u)*}. Finally, A’/N, + 0B*/N, has a unique minimum with 
respect to A = 1 — Bat A = N,0/(N2 + N,0) so that E{A’/N, + 6B’/N:} = 
6E{1/(N2 + N,6)} which proves the left-hand inequality of (c); since Nz + 
N,@ S n max (1, 6), the proof is complete. 

It will be instructive at this point to examine the risk function for the usual 
one-stage experiment for estimating u, which would be to observe n/2 of the X; 
and n/2 of the Y,. If we confine attention to estimators »’ such that Ey’ = u 
and assume the variables normally distributed, then Var u’ = 203/n(1 + 8), 
since (OX + Y)/(@ + 1) is the minimum variance unbiased estimator with 
variance 203/n(1 + @) when the variances are known. Then, R(@) = (Var u’)/ 
Vo 2 2 max (@, 1)/(1 + 6) and 2 max (@, 1)/(1 + @) 2 1 with equality if 
and only if @ = 1. Hence for each fixed 6 = 1, the risk function is bounded 
away from unity independent of the sample size. One would hope that the risk 
for the two-stage scheme would prove to be smaller, and we shall see-for large 
samples at least-that this is in fact the case if m is suitably chosen. 

Returning to the two-stage experiment, it is clear that, once an estimator is 
specified, the only variable left at the statistician’s disposal is the quantity m. 
Then, given an estimator of form (1), we may say that any real-valued function 
m(n) which satisfies 4 S 2 m(n) < n forall n 2 5 is a solution to the problem. 
Can we find an optimum solution? 

With respect to an estimator of form (1), we shall call m(n) a uniformly 
consistent solution (u.c.sol.) if supsl,(@, m(n)) — 1 as n — ©; where they 
exist, we shall restrict attention to such solutions. Further, if supsR,(8,m) < «, 
a solution which minimizes supR,(6, m) will be called a minimax solution 
(m.m.sol.). If there exists a u.c.sol., then a m.m.sol. is u.c. too. Hence, the 
minimax principle affords a means of choosing one solution from the class of 
u.c. solutions. 


3. A simple estimator. In this section an asymptotic minimax solution will be 
derived for a particular unbiased estimator. In this and the following section, 
we shail assume that the X,’s and the Y,’s are normally distributed. Hence, 
R@ = Sio3/ S30; obeys the F-distribution with m — 1, m — 1 degrees of freedom, 
and we may write 


e oo m—1 m—1\" [* wae l—m 
1(0,m) = Prikz 1} = B(™>+,™=1) fs (1 +2)" ™ dz. 





ESTIMATING A COMMON MEAN 1167 


We now define 4, = A,X», + Bi¥w,, where A; = 1 or 0 according as R < 1 
or R 2 1. This estimator has form (1) and, by Theorem 1, 4; is unbiased and 


Ri,(0,m) = nmax (1,1/6) E{Ai/N; + Bi6/N2} 


= n max (1,1/8) [pe + aan) 
n—-m n—-m 


= max (1,1/6)(1 — m/n)™* [1 + (@ — 1) J (0, m)}. 


It is easy to show that R,,.(6,m) = Ry(1/0, m) by using the fact that /(@,m) = 
1 — J(1/6, m); thus, when considering sups/?;,(@, m), we may restrict ourselves 
to @ = 1. Before continuing with the study of the risk of the proposed estimator, 
we introduce some lemmas. 

Lemma 1: For @ = 1, 1(@, m) < (26/(1 + @)]”". 

Lemma 2: Let @ = 1 + 2rm™. For r = 0 and + bounded, I(6,m) — #(—r) = 
O(m*) uniformly in ras m— ~. 

Proor or THE Lemmas: To begin with, 


m—1 m—1 
1(6,m) = Pr{R@ = # = rl U/d Vi 2 y = Pr{S,_; 2 0}, 
1 1 


where Sa-1 = Zi + eee + Zm~i 5 Zz; = U; “> OV, and U;, U;, vee, Vat > 
Vi, V2,--:, Ves are independent random variables, each obeying the chi- 
square distribution with one degree of freedom. Then Z,, Z2,--- , Zm—; are 
independent and identically distributed random variables, EZ; = 1 — 6 s 0 
since by hypothesis @ = 1, and Var Z; = 2(1 + @) 2 4. 

The moment generating function of Z;, is 


M(t) = Ee'™* = [1 + 2(0 — 1)t — 4ory”. 


Using the Chebyshev inequality, /(@,m) = Pr{S,_; 2 0} s Ee"-' = [M(4))"", 
t = 0. Since M(t) is a minimum at ¢t = (@ — 1)/46, we obtain Lemma 1. 

Next, let F,,:(z) be the distribution function of the variable (S,_,; — ES,—.)/ 
(Var S,.,)' so that 1 — J(0, m) = Fai(t) when zg = (6 — 1) 
(m — 1)*2(1 + &) 7; if 6 = 1 + 2rm>, + = Oand + = O(1), thenz = r + 
e(r, m) where «(r, m) = O(m™*) uniformly in r. Noting that 7 bounded implies 
6 bounded, and that 


|Z; — EZ, |!) s 8{|Z.\° + | EZ. {"} < 8{(U, + 0V,)* + (0 — 1), 
we observe that E |Z, — EZ; \* is bounded. Then the Berry-Esseen Theorem 
[1] states that 
(2) |Fra(z) — (2)| < CE{\Z; — EZ,\\/(VarZ,)'(m — 1) 


for all z where C is a constant. Since (Var Z,)' = 8 and (m — 1)' > m'/2, 
we find from (2) that F,...(z) — ®(z) = O(m™*) uniformly in 7. But @(z) — 
(rr) = O(m*) uniformly in r since | @(z) — @(r)| < | e(r, m)| . Then F,,_;(z) 
— Or) = O(m") uniformly in r and, since 1 — /(@,m) = F,,4(z), the proof 
of Lemma 2 is complete. 





1168 DONALD RICHTER 


Now let K(6,m) = (@ — 1)I(0,m) for @ = 1 and take m 2 4. Then K(@, m) 
is continuous in 6, K(1,m) = 0, and K(@,m) > Ofor1 <d < «. From Lemma 
1, K(0, m) < 4(20'/(1 + @))””*; therefore K(@,m) < 4 and K(@, m) > 0 
as 6 — «,. Hence K(6, m) has an absolute maximum with respect to 6,1 < 
6 < «. Using the integral representation of /(@, m), straightforward differen- 
tiation yields 

d°K(6,m) _ (m — 3) 0 OEE — 26(m + 1)/(m — 3) + 1) 
oF 2(6 + 1)" B((m — 1)/2, (m — 1)/2) 
Hence, 


<Oif 1S 6 < & 
2 
ESL = Oif6 = 6 


> Oif @ > % 


where @ = 1 + 2'(m — 1)*/(m — 3) + 4/(m — 3). It follows that K(6, m) 
has exactly one maximum w.r.t. @, and that the maximizing value of @ satisfies 
the inequality, 1 < 6 < %. Next, let @= 1+ 2rm™, ¢ = 0, ro = (6) — 1)m'2™ 
= 0(1), so that m’ maxecs,K(0,m) = max,cy,27](0, m) = MAX,<p,27b(— 7) 
+ O(m™) by Lemma 2. The root of the equation #(— r) — r¢(r) = 0 gives 
the r-value which maximizes r}(— 1); call this root r’ (approximately .75) and 
let ¢c = 2r'6(— +’) (approximately .6). We have proved the following result. 

Lemma 3: For @ = 1, (@ — 1)I(@, m) has a unique, absolute maximum with 
respect lo @ at@= 1+ 2r'm* + o(m™*), and maxe(@ — 1)1(6,m) = cm? + 
O(m~), where r’, ¢ are positive constants defined above. 

Returning to the expression for the risk, we find that maxl,,(@, m) = 
maxs>if,(8,m) = (1 — m/n)'(1 + em + O(m™)) by Lemma 3. Suppose 
now that maxh,,(@,m) — 1 as n — «. Then of necessity (1 — m/n)* — 1 
and cm — 0 as n — , implying respectively that m/n — 0 and m — @ as 
n — «. Since the converse is obviously true, we have proved that m(n) is 
u.c. if and only if 


(*) m(n)/n —0O and m(n) —> eo as n-—- o, 


That is, (*) characterizes the class of u.c. solutions for 4, . We can now deter- 
mine the m.m. sol. 

Turorem 2: The minimaz solution for ji, is m(n) = (en/2)' + O(n') and 
min, maxehn(0,m) = 1 + 3(¢/2)'n* + O(n"). 

Proor: For all u.c. solutions, and hence for the m.m. sol., maxs;,(@, m) = 
1 + m/n + cm? + e where m = m(n) and ¢ = em'/n + O(m’/n’) + O(m") = 
o(m/n) + o(m™). Both m/n and m™ converge to zero as n — ©, one being 
an increasing function of m, the other a decreasing function. Hence a necessary 
condition for a minimum is that m/n and m™ be of the same order of magni- 
tude. Hence maxs;,(0, m) ~ 1 + m/n + cm‘ asn— «. The expression 





ESTIMATING A COMMON MEAN 1169 


m/n + em™ has a unique, absolute minimum at m = (cn/2)!, and finally, 
taking into account the order of magnitude of «, we obtain Theorem 2. 


4. A class of estimators. The estimator j, , though undoubtedly satisfactory 
in some instances, is a relatively simple function of the observations, and one 
may therefore ask if better (in the sense of smaller risk) estimators exist and— 
if they do-if results like Theorem 2 can be found for such better estimators. We 
shall see that the answer to both questions is in the affirmative. 

We begin by defining 2. = (Ni0Xwv, + N2¥x,)/(Ni@ + N2), whose risk is 
R.,.(@, m) = nmax (1, 6)E{1/(N,@ + N;2)}; then R,,(@, m) is a lower bound 
for the risk of all estimators of form (1) by Theorem 1(c). Though jf, has form 
(1), it is not in fact an estimator, except in the case when @ becomes known 
at the completion of sampling. If 6 — @ in probability, a natural way to obtain 
a bona fide estimator from f, is to replace @ by 6. It is mathematically con- 
venient to use a 6 based on the first stage only; we take 6 = 1/R, and define 
bs sea (NiXw, + N.RY w,)/(Ni + NR). 

The motivation for i; may be seen in another way. As mentioned in Section 2, 
for a one-stage experiment with @ known, 7 = (m0X,, + mY.,)/(m0 + ne) 
is the UMVU estimator of u. When @ is unknown, it is customary in practice to 
use the estimator obtained from 9 by replacing @ by 6. If one takes 6 to be the 
usual estimator of @ but based on 2 min (n,, nm.) observations, and if one re- 
places the numbers n; , m, in 9 by their values in our procedure, i.e., by the ran- 
dom variables N, , N:, the result is 4; . 


Another estimator which might be considered is f,, the grand mean of all 
the observations: f, = (Ni: Xw, + N2¥w,)/n. Then 


Ry.(0, m) = (1/n) max (1, 1/0) E{ Ni + N29} 
= (1/n) max (1, 1/0)[(n — m + m@)[1 — 1(6,m)] + [m + (n — m)0\1 (6, m)} 


= max (1, 1/6)(1 + (@ — 1)[m/n + (1 — 2m/n)I(6, m)]] 


so that supsh,,(6,m) > (m/n) supe>:(@ — 1) which tends to ~ as @-» «. We 
conclude that for 4, there exist no u.c. solutions and no non-trivial m.m. solu- 
tions. (Note that 4, would be worthy of consideration if it were known a priori 
that @ was close to unity, since R,,(1, m) = Ra(1,m) = 1 and since it can be 
shown that R,,(@,m) > R,.(@,m) for} S @ S 2.) 

Before discussing the proposed estimator f; , it will be convenient to study 
first f2 and R,,(@, m). By evaluating the expectation, 


Ren( 8, m) 
= n max (1, 6)[[1 — 1(6, m)}/[(n — m)@ + m] + 1(6, m)/(mé + n — m)]) 
= max (1, 1/6@)[1 + r:(@, m/n) + 1r2(0, m/n)I(6, m)}, 


where 7,(0, m/n) = (m(@ — 1)/n6@)/(1 — m(@ — 1)/n8@) and r.(6, m/n) = 
(1 — 2m/n)(@ — 1)/{1 + (m/n)(1 — m/n)(6 — 1)*/6). As for R,,.(0, m), we 
have R;,(0,m) = Rz(1/6, m). 





1170 DONALD RICHTER 


Now let 4 be any estimator of form (1) such that supR,(@, m) s . 
maxyh;,(@,m) + o(1). If m/n-+0 and m— ~ asn—> «, then maxsR,,( 0, m) 
and hence sup,i?,(@, m) tend to unity as n — o. If on the other hand 
sup, (6,m) —+ lasn—» o, then, since R,,(0,m) < R,(6,m), supeRen(0,m) — 1 
asn— ©. But supsf2,(6,m) = supsaif,(6, m) and, for 6 = 1, Ren(@,m) = 
1 + 7, + rol is the sum of three positive terms. Therefore, (i) supe2i71(8, m/n) — 
Oasn— o, and (ii) supe pire( 0, m/n)I(0,m) +~O0asn— ow. Since r,(0,m/n) = 
m(@ — 1)/n@, (i) implies that m/n-—0 as n— «; for any fixed & > 1, 
r2(6’, m/n) is bounded away from zero, so that (ii) implies that /(@’, m) — 0 
asn-> »,and som— © asn—» ©. We have proved the following result. 

TueoreM 3: For any estimator of form (1) whose risk satisfies supeR,(0,m) S 
maxeh,,(0,m) + 0(1) asn— , m(n) is u.c. if and only if condition (*) holds. 

At this point we introduce a lemma. 

Lemma 4: Let 6 = 1 + 27m" and let C be a positive number. If 2r > C, then 
(6 — 1)1(0, m) < me" for m, C sufficiently large. 

Proor: Let L(@,m) = m'(@ — 1)1(0,m) = 2r1(0,m) < 27r(26°/(1 + 6))”"” 
by Lemma 1. Since 26°/(1 + @) = 1 — (1/8)(@ — 1)* + O((6 — 1)’), there 
exists a number 6, > 1 such that log (20°/(1 + @)) < —(@ — 1)*/16forl < 
65 6,.Thenforl S63 4, 


log L(0,m) < log 2r + (m — 1) log (26/(1 + 6)) 
< log 2r — (m — 1)(@ — 1)*/16 S log 2r — (2r)*/32. 
For 27 >C and C larger than some constant, log L(@,m) < —(2r)*/64 < 
—C/64 which proves the lemma for @ Ss 6, . If we now require m' > 2¢’ /(@, — 1), 
then, by Lemma 3, (@ — 1)/(6, m) is a strictly decreasing function of @ for 
6 = 6,, which completes the proof. 
We want next to determine the minimax solution for @ . For this purpose, we 


may assume with no loss of generality that @ = 1 and (*) holds. Then, R:,(6,m) = 
1 + 7,(0, m/n) + r2(0, m/n)I(6, m), where 


O < ri(0,m/n) = Dd) (m(@ — 1)/nd)? < 2m(@ — 1)/nd <1, 


i= 


r2(0,m/n) S @— lfor@2 landr.(@,m/n) = (@ — 1)[1 + O(m/n)] uniformly 
in any bounded @ interval. We will employ a four-fold dissection of the set of 6 
values in order to find supel:,(6@, m). 
Case 1: {0:1 < 6 S 1 + Cm} where C is a large number. Since r,(0, m/n) 
< 2C'm'/n, one finds from Lemma 3 that 
maxshe,(6,m) = 1 + em? + O(m™) + O(m'/n) = pi, say. 
Case 2: {0:0 > 64n/m!}. Since 
R2.(0, m) —> (1 — m/n)* = 1+ m/n + m'/n? + +--+ = pe, 
say, as 6 — , and since (1 — m/n)* — 1 — 1r,(0, m/n) > m/néO, we have 
pe — Ren(0,m) > m/n@ — r( 0, m/n)I(0, m). From Lemma 1, 


r.(0, m/n)I(0, m) < 4(4/0)°""”?, 





ESTIMATING A COMMON MEAN 1171 


so that, for m = 7, » — Ro(0,m) > m/n@ — 4(4/6)* which is positive for 
6 > 64n/m. Hence, supe: (8, m) = pe. 

Case 3: {0:1 + Cm™* < 6 < 6} where @ is a number satisfying 4 < @& < 128. 
Then, 


r,(0, m/n) < 7:(0’, m/n) = m(0 — 1)/n@ + (m(@ — 1)/n0’)* + --- 
and 
r(0, m/n)1(0, m) < (6 — 1)1(0,m) < me" 
by Lemma 4. Suppose m/n = o(m™); then 
Ro,(0,m) <1 + me! +. o(m™) 


which is smaller than p, for C, m large enough. Otherwise suppose m™* = O(m/n); 
then 


Rz,(0,m) <1 + m(0 — 1)/n@ + e *'™ O(m/n) + O(m'*/n’) 


which is less than p, for C, m sufficiently large. 
Case 4: {6:0 Ss @ S 64n/m}. Then, 


r,(0,m/n) < m/n + (63/64)m’/n® + O(m*/n’) 
and 
r2(0, m/n)1(0,m) < 4(4/0)'°"”” < 48"”° 


where & = 4/6 < 1. Suppose a” = o(m’/n’); then Re.(0,m) <1 + m/n + 

(63/64) m’/n® + o(m’/n’) which is smaller than p,. Else suppose m’/n’ = 

0(8”*); then Re.(0,m) < 1+ 0(8'"”) which is smaller than p, . 
Combining the results of the dissection, we have 


(3) sup R2,(@,m) = max [p; , pr] 


for sufficiently large m. To minimize (3), we equate p; and p, , obtaining 
1+ cn * + O(n), 
(5) m(n) = (en)* + O(n') 


as the m.m. risk and m.m. sol., respectively, for R:.(@, m). Moreover, if we let 
€, > 0 be any function of n which is O(n), the results (4), (5) will hold for 
any R,(6, m) such that R,,.(6,m) s R,(6,m) S R2.(0,m) + «, ; Theorem 3 
will apply also. We summarize these results as follows. 

Tueorem 4: Let fi be any estimator of form (1) whose risk satisfies R,(0,m) S 
Ron(0, m) + €, where «, = O(n). Then for ji, the class of u.c. sols. is character- 
ized by (*), and the m.m. risk and the m.m. sol. are given by (4) and (5) respec- 
tively. 

Let us return now to the estimator 4, = A,Xwv, + B,Yx, where A; = 





1172 DONALD RICHTER 


N,/(N; + N2R), whose risk by Theorem 1 is 
Rsp(8,m) = n max (1, 1/0)E{A3/N; + 0B3/N3} 
= n max (1, 1/6) E{(N, + N2R’6)/(Ni + N2R)’}. 


Let Z = R@ so that Z obeys the F-distribution with m — 1, m — 1 d.f.; denote 
by F(z) the distribution function of Z, so that F(z) = 1 — I(z, m). Then, 
Rs,(0, m) = max (6, 1)E{(N,0/n + N2Z*/n)/(N,0/n + N2Z/n)’}, or 


Rs,(6,m) = max (6,1) r T, dF(z) where 


(6) 


T, = ((1 — a) 6+ .2°)/((1—a)6+ az), a= ee a. 


ia * (1 — m/n) 6 + (m/n)2 
* (m/n) 6 + (1 — m/n)z” 

+ {eortasee 7 |. 
where dF(z) = [B((m — 1)/2, (m — 1)/2}* 2"®"(1 + 2)’ dz. By substi- 
tuting w = 1/z in (7), using the fact that dF(z) = —dF(1/z), and simplifying, 
one shows that R;,(@,m) = Rsn(1/6, m). 

Next, write T; = 1/((1 — a)@+ a) so that R,,(6,m) = max (0,1) fT; dF(z). 
, Using (6), define D,(0,m) = Rs.(0,m) — Ro»(0,m) = max (0, 1)foU, dF(z) 
where U, = T, — T; = all — a)0(z — 1)*/[((1 — «0 + az)*((1 — a) 0 + @)). 
Obviously D,(@, m) = 0 and D,(@, m) = D,(1/0@, m) by the properties of the 
risk functions; moreover, D,(@,m) > 0 for 0 < @ < @ since U, > 0 except 
when z = 1, an event of measure zero. And since U, S (m/n)(1 — m/n)™ 
(z — 1)°/6 for all z and for @ = 1, we have D,(6,m) < (m/n)(1 — m/n)™* 
fo(z — 1)? dF(z) forall 6. But fo(z — 1)* dF(z) = 4(m + 1)/(m? — 8m + 15) Ss 
56/m for m = 6. Since (1 — m/n)~ < 2, we have proved that 


(8) D,(6,m) < 112/n, for all @and 6 S m < n/2. 


Therefore, the solution for the proposed estimator 4; is specified by Theorem 4. 

The last result implies that the m.m. risk for 4, is ~1 + c'n™ and from 
Theorem 2 the m.m. risk for 4; is ~1 + 3(c/2)'!n*. Since c! < 3(c/2)!, this 
provides one reason for preferring f, to 4 ; an additional reason is given by the 
following result: Given any 0,0 < @ < ©, one can find an mp» such that 
R;,(@, m) < Ri.(@, m) for m = mo. This is easy to see since R,,(6, m) — 
Ro(8,m) = fo(W — 6T7;) dF(z), where W =(1 — m/n)™ if z < 6,W = 
6(1 — m/n)™ if z = 0, and where we assume with no loss of generality that 
6 = 1. Noting that W — 67, > m/@n for all z, and introducing (8), one ob- 
tains the desired result. 








ESTIMATING A COMMON MEAN 1173 


We see then that the main result of this paper is embodied in Theorem 4 
where it is shown that the solution (5) holds throughout a certain class of best 
estimators. Moreover, by the inequality (8), it is shown that 4, is a member of 
this class and that, if there exists an estimator of form (1) with uniformly smaller 
risk than that of 4; , it has the same large sample solution. 

The author is deeply indebted to Professor Wassily Hoeffding for suggesting 
this problem, and for his advice and encouragement throughout the course of 


the work. The author is grateful to editor W. Kruskal and the referee for their 
helpful comments. 


REFERENCES 

{1} A. C. Berry, “The accuracy of the Gaussian approximation to the sum of independent 
variates,” Trans. Amer. Math. Soc., Vol. 49 (1941), pp. 122-136. 

{2} S. G. Gaurye anp Hersert Rosstns, “Two-stage procedures for estimating the dif- 
ference between means,”’ Biometrika, Vol. 41 (1954), pp. 146-152. 

(3) Josern Purrer, ‘‘Sur une méthode de double échantillonnage pour estimer la moyenne 
d’une population laplacienne stratifiée,”’ Rev. Inst. Internat. Stat., Part 3 (1951), 
pp. 1-8. 


(4) Herserr Rossins, “Some aspects of the sequential design of experiments,” Bull. 
Amer. Math. Soc., Vol. 58 (1952), pp. 527-535. 








RANK-SUM TESTS FOR DISPERSIONS' 


By A. R. Ansari anp R. A. Brapiey? 
Virginia Agricultural Experiment Station, Virginia Polytechnic Institute 


1. Summary. This paper deals with non-parametric two-sample tests on dis- 
persions. Two samples, X- and Y-samples of m and n independent observations 
from populations with continuous cumulative distribution functions F(u) and 
G(u) respectively, are considered. It is required for the basic test that the dif- 
ference in locations (medians) of the two populations be known and, when this 
is so, the two samples may be adjusted to have equal locations. Taking these 
location parameters to be zero without loss of generality, we test the hypothesis 
that G(u) = F(«) against alternatives of the form G(u) = F(@u), 6 # 1. The 
two samples are ordered in a single joint array and ranks are assigned from each 
end of the joint array towards the middle. The statistic used is W, the sum of 
ranks for the X-sample. 

The distribution of W is studied and tables of significant values of W are 
provided for m + n S 20 and both upper- and lower-tail significance levels 
.005, .01, .025 and .05. The first four moments of W are developed and a normal 
approximation to the null distribution of W is devised. 

Large-sample properties of the W-test are considered. A proof of limiting 
normality is based on a theorem of Chernoff and Savage. Consistency of the 
W-test is indicated and its relative efficiency in comparison with the variance- 
ratio F-test is obtained as 6/2’ when F(u) is the normal distribution function. 

Other non-parametric tests of dispersions are reviewed. The W-test is less 
efficient asymptotically than some of these other tests but is easier to apply, 
particularly with the tables provided. 

A modified test is suggested for the case where the difference in population 
locations is not known. This involves replacing the two original samples by two 
corresponding samples of deviations from sample medians. The procedure of 
the W-test is applied to the two samples of deviations. The properties of the 
modified test have not been investigated except for a sampling study of rather 
limited scope. That study indicates that the moments of W for the modified 
test are not greatly different from those under the basic procedure. 


2. Introduction. Let X,, --- , X,, and Y;, --- , Y, represent two independent 
samples of sizes m and n of independent observations from two populations 
with continuous cumulative distribution functions (c.d.f.’s), F(u) and G(u) 


Received August 17, 1959; revised December 21, 1959. 

1 Research sponsored by the Statistics Branch, Office of Naval Research, U. 8. Navy. 
Reproduction in whole or in part is permitted for any purpose of the United States Govern - 
ment. 

? Present address: Department of Statistics, the Florida State University, Tallahassee, 
Florida. 


1174 








RANK-SUM TESTS FOR DISPERSIONS 1175 


respectively. We shall assume for the most part that the difference in location 
parameters (medians) ux — uy of the two populations is known and it may be 
taken to be zero for we may adjust the initial samples by subtracting ux — uy 
from each X-observation. The parameters ux and wy need not be known, but 
no generality is lost in assuming ux = uy = 0 in what follows. Then F(u) and 
G(u) are assumed to be of the same form and to differ at most in the value of a 
scale parameter 6, so that G(u) = F{@u). We develop a rank-order test of the 
null hypothesis, 


(1) Hy: @= 1, ie., Glu) = F(u), 


against either one-sided or two sided alternatives to Ho. 


The two samples (adjusted by ux — uy if necessary) are ranked or ordered 
in a combined array represented by 


(2) Zi, °°* » Smtn- 


But ranks are assigned from both ends of (2), beginning with unity and working 
towards the center. If m +. n is even, we have the array of ranks 


(3) 1,2,3,---, (m+ n)/2, (m + n)/2,--- , 3, 2,1; 
and, if m + n is odd, we have 
(4) 1,2,3,---,(m+n— 1)/2, (m + n + 1)/2, 

(m +n —1)/2, +++, 8, 2,'L. 
The test statistic to be considered is : 
(5) W = 2 RZ), 


the sum of the ranks in (3) or (4) associated with the X-sample. An alternative 
form, equivalent to (5) and more useful in mathematical considerations, is 


P min 
(6) W => ii + Li (m+n+1 — ie 
tel t—p+ 


where 6; = 1 if Z, is an X-observation and 6; = 0 otherwise, and where 
p = |(m + n + 1)/2), 


the largest integer in (m + n + 1)/2. Small values of W indicate larger dis- 
persion for the X-sample and large values of W indicate larger dispersion for 
the Y-sample. Small values of W suggest that @ < 1 and large values of W sug- 
gest that @ > 1. The test based on W and its properties are discussed in the fol- 
lowing sections. 

Freund and Ansari [5] proposed the W-test and seem to have been the first 
to make such a proposal. David and Barton [4] presented a generalized procedure 
that includes the W-test as a special case but they did not investigate the prop- 
erties of their method. Sukhatme ([13], [14]) has proposed two other rank order 











1176 A. R. ANSARI AND R. A. BRADLEY 


dispersion tests. The first of these is similar to the W-test but requires knowl- 
edge of ux and wy , not simply of ux — ur. 

It is of interest for comparative purposes to note that Sukhatme’s first test 
uses the statistic 


= > 2 x(X;, ¥ 


mn ¢ 


w#O<X¥<YorY <X <0, 
x(X, Y) = 


otherwise. 


Let n_(X), n_(Y), n4(X) and n,( ¥) indicate numbers of negative and positive 
X- and Y- observations. Then 


mnT = » R’(Z) — 3[n_(X){n_(X) + 1) + n,(X){n,(X) + 1)} 


where 2s R’(Z) is the sum of ranks associated with the X-sample when the 
ranking is modified as in the array 


1, 2,3,---,n_, Ny ,°°°, 3, 2, 1, 


n_ = n(X) + n_(Y), ny = ny(X) + ni(Y). The statistic T depends on 
rankings of positive and negative observations separately and on the numbers 
of positive and negative values of X. Our statistic W, although a similar statistic, 
avoids attachment of any meaning to the zero point of the scale of X. 

A statistic W’, associated with W, could also be used for a test procedure; 
W’ could be obtained directly by ranking from the center of the array (2) 
towards the two ends beginning with unities if m + n is even and with a zero 
if m + n is odd. Now W’ is equivalent to W, since 


(7) W’ = 4m(m +n) + m—- W 
if m + nis even and 
(8) W’ = 4(m+n+1) -—-W 


if m + n is odd. W’ may be preferred to W if a statistic is desired such that 
large values of the statistic occur with larger dispersion for the X-sample, but 
tests based on W and W’ are equivalent in their properties. 

Other nonparametric tests on dispersions are available.’ We note papers by 
Rosenbaum [11], Kamat [6], Barton [2], Lehmann [7], Terry [15] and Mood [8]. 
In addition, tests have been proposed that are corisistent against more varied 
departures from equality of two c.d.f.’s, but these will not be discussed. We do 
compare, when possible, the W-test with other tests on dispersions. 


* Sidney Siegel and John W. Tukey [12] in a recent paper have a test similar to the W- 


test. They rank the array (2) as: 1, 4, 5, 8,9, --- , 7, 6,3, 2 and this permits use of existing 
tables. 





RANK-SUM TESTS FOR DISPERSIONS 1177 


In a final section we discuss what may be done if ux — wy is not known. The 
W-test cannot then be used directly because it is in some cases sensitive to 
differences in locations of the two populations. 


3. The Null Distribution of W. When Hp is true, 6 = 1 and F(u) G(u). 
Under these conditions, each of e . ") distinct assignments of m X’s to the 


m + n positions in (2) is equally likely, and each such assignment yields a value 
of W. Probabilities of occurrence of each distinct value of W are obtained as 


the product of 1 /(" a 7 and the number of distinct assignments that yield 


that value of W. Tables for the cumulative distributions of W for various values 
of m and n may then be prepared. 

Ansari [1] has prepared tables showing the complete distributions of W for 
m = 2(1)10,m +n = 4(1)20. In Table 1 we give only critical values of W for 
significance levels .995, .99, .975, .95, .05, .025, .01, and .005 taken from the 
complete tables. If Wo(a) is a critical value of W with significance level a, 


P(W 2 Wi(a)| Ho] S a 


fora Ss 05 and PiW s Wi(a)| Ho) S 1 — afora 2 95. 

Both a recursion formula and a frequency generating function have been 
derived to facilitate consideration of distributions of W. Let f( W | m, n) denote 
the frequency of occurrence of W given the sample sizes m, n. (The corresponding 


probability is P(W|m, n) = f(W\m, n)/ (” - 5 .) The recursion formula 


form = 2is 
(9) f(W\m,n+1) = f(W\ m,n) +f(W-—-N—1\|m—1,n+1) 


where m + n = 2N orm +n = 2N + 1 depending on whether m + n is even 
or odd. Alternatively, (9) may be written 


(10) (m+n+1)P(W\| m,n + 1) 
= (n+ 1)P(W\| m,n) + mP(W — N — 1|m— 1,n +1). 


The frequency generating function is 


N 
(11) g(u, v) I] 1 + wv)’ ifm+n = 2N 


tl 
N : 
(1 + u¥*y) [TJ 1 + wv)? ifm+n=2N +1. 
t=—1 
The frequency function {(W | m, n) is the coefficient of u”v™ in the expansions 


of (11). 
The recursion formula is very nearly obvious in the form (9). Consider the 





A. R. ANSARI AND R. A. BRADLEY 


TABLE 1 


Lower and Upper Significance Levels of W 


Sample Sizes Significance Levels 


95 05 


nN to 


te 


nN bw 
“co 


tm tw 
tw bt b 


~ 


| 
w i 
mw bt bo 


Www hw WW NW DW W WW WN W WD tO 
NS tw bo 


wwwww w e 


N Ww Ww to te 
nt 


of 
Pana a SS 


oo 
— 
— OO = 


3 
3 
3 
3 
3 
3 
3 
3 
3 
3 


> 


w bo 


oat +} eS S| 


wo 
_ 


wo ow 
oo 
a> D> 
t ht w Ww ty to 


| 


~ eee e ee ee ee eS 


rm ®D 





RANK-SUM TESTS FOR DISPERSIONS 


TABLE 1—Continued 


Sample Sizes Significance Levels 


# 975 95 os . 025 


10 10 20 
10 ll 

1] 1] 24 
11 12 26 
12 13 28 
12 14 ‘ 30 
13 

14 

14 

15 


15 


oe 


oo or or or Gr Gr Gr Or 





14 
15 


55 
58 


60 
64 
61 ie j 67 


67 if 72 


case with m + n = 2N. Note that f(W | m, n + 1) is made up of two parts: 
the frequency of W when W does not contain the rank N + 1 and the frequency 
of W when W does contain N + 1. The first part is f(W | m, n) and the second 
is f(W — N — 1|m — 1,n + 1) and (9) follows. The demonstration is similar 
when m +n = 2N + 1. 





1180 A. R. ANSARI AND R. A. BRADLEY 


The frequency generating function is easily proved by mathematical induc- 
tion. The fundamental part of the induction is proved using (9). We do not 
give details of the proof since it is easy but somewhat cumbersome, and since 
Barton and David also considered a generating function from which (11) re- 
sults as a special case. 


4. Moments and Approximate Distributions under H,. The approximate 
distribution of W under H, is of interest for applications of the W-test beyond 
the scope of the prepared tables. We examine the moments of W on the hypothe- 
sis that all assignments of the X- and Y-observations to the array (2) are equally 
likely. 

Suppose that m + n = 2N. Then 
(12) E(W) = mE,(r) = m(m + n + 2)/4, 
for we consider £;(r) as the expectation of an integer chosen at random from 
the first N integers and (N + 1)/2 = (m + n + 2)/4. We write 
(13) E(W*) = mE,(r’) + m(m — 1)E(rs) 


and 


(14) E(rs) = E (*) E,(rs) + (7) B(vs) | / Pays 


Now E,(r’) = (N + 1)(2N + 1)/6, the expectation of the square of an integer 
selected at random from the first N integers; £;(rs) = (83N + 2)(N + 1)/12, 
the expectation of the product of two distinct integers selected at random from 
the first N integers; and E,(rs) = (N + 1)*/4, the expectation of the product 
of two integers selected at random separately from two sets of the first N in- 
tegers. Coefficients of Z, and EF, in (14) are the appropriate weighting prob- 
abilities. Substitution in (14) and then (13) yields 


(15) E(W*) = [m(N + 1)(2N + 1)/6] 

+ [m(m — 1)(N + 1)(3N* + N — 1)/{6(2N — 1)}}. 
Then, from (15) and (12) with replacement of N by (m + n)/2, we have 
(16) wz = ow = mn(m +n — 2)(m +n + 2)/[48(m + n — 1)]. 
Through similar arguments, 
(17) ws = 0 
and 


mn (m + n + 2) 
Ma 


x 3840 (m +n — 3)(m +n — 2)(m+n— 1) [5mn (m + n) 


— 2(m> + 19m'‘n + 52m'n’® + 52m'n* + 19mn‘ + n°) 
+ 4(3m* + 16m'n + 26m’n® + 16mn’ + 3n‘) 





RANK-SUM TESTS FOR DISPERSIONS 


— 4(6m*® — 34m’n — 34mn’ + 6n') 
— 16(2m* + 25mn + 2n*) + 96(m + n)). 


When m + n = 2N + 1, derivations of moments are slightly more complicated 
but similar. We obtain 


(19) E(W) = m(m + n + 1)*/[4(m + n)], 

(20) us = ow = mn(m +n + 1)[3 + (m + n)*)/[48(m + n)’], 

(21) ws = mn(n — m)(m +n —1)(m+n + 1)*/[32(m + n — 2)(m + n)'I, 
and 


(22) s mn(m +n-+ 1) 
“/ PS 3840 (m + n — 2)(m + n)* 


+ 57m‘n® + 100m‘n® + 100m'n* + 57m’n* + 17mn* + 2n’) 
+ 2(m*® + 14m'n + 47m'‘n® + 68m'n* + 47m’n‘ + 14mn* + n°) 
+ 2(2m* — 35m‘n — 115m'n®? — 115m’n® — 35mn‘ + 2n°) 


[5mn(m + n)* — (2m’ + 17m'n 


+ 15(4m* — m’n — 10m’n® — mn* + 4n‘) + 15(2m* + 9m'n 
+ 9mn® + 2n*) — 30(m* — mn + n’)). 
The moments above are sufficient to show that 


(23) us/uh = 0, m+n = 2N, 
(24) ua/ut = O(N), m+n=2N +1, 


and 


(25) pe/ua = 3+ 0(N"*), m+n = 2N or2N +1. 


The limits are considered as N — «© with m/n constant. These results suggest 
the use of 


(26) u = (W — E(W)+ })/ow 


as a standard normal deviate for large m and n and with E(W) and oy» obtained 
from (12) and (16) or (19) and (20) as m + n = 2N or 2N + 1. As is often 
done in similar situations, 4 in the numerator of (26) is a continuity correction 
with the sign chosen to diminish the numerator numerically. Comparisons with 
the exact distributions of W have shown that the use of the continuity correction 
is advantageous. 

In Table 2 we have considered two situations: m = 3,n = 11 and m = 7, 
n = 7. Cumulative probabilities, P(W <S Wo), are shown and the corresponding 
probabilities based on the normal approximation. It is seen that the normal 
approximation is quite useful at these values of m and n and somewhat better 
when m = n than when m # n. 


The Pearson system of frequency curves may be used to obtain somewhat 





A. R. ANSARI AND R. A. BRADLEY 


TABLE 2 


Comparisons of P(W s Wo) Based on Exact and Normal Approzimations* to 
the Distributions of W 


m=3,n= 11 (The distribution of W is symmetric about W = 12) 


| | 
We 4 5 6 7 8 9 -— | B 


i f =r } : | 
Exact Prob. 0055 .0165 | .0440 | .0824 .1429 .2253  .3297 | .4396 
Normal Approx. .0093 | .0207 | .0422, .0775| .1357 | .2165 | .3188| .4376 


m=T7,n=7 (The distribution of W is symmetric about W = 28) 


We is | 9 | 2 a | 2 23 24 


Exact 


Prob....|.0006 |.0017 |.0052 | .0122 |.0256 |.0466 |.0804 |.1270 |.1894 |. 2652 |.3537 | .4493 
Normal | 


Approx |.0016 | 0035 |.0073 |.0141 |.0232 | .0478 '.0794 |.1233 |.1742 |.2604 3502 .4487 


* The continuity correction has been used. 


better approximations to percentage points of the distribution of W, particularly 
when m # n. The statistic u in (26) is again computed but now we also require 
8; = ui/u2 and B, = ps/uz. Table 42 in [9] is then entered with appropriate 
values of 8; and 6 and selected percentage points of the distribution of u are 
read from the table. Trial use of this method suggests that it is better than the 
normal approximation but we believe that the latter is sufficiently good for 
practical purposes when m + n exceeds values in Table 1. 

Ansari [1] has shown additional tables like Table 2 and also illustrated the 
use of the Pearsonian approximation. 


5. Limiting Normality. Limiting normality of W is established through use of 
a theorem of Chernoff and Savage [3]. We first define their notation and then 
show how the theorem applies to W. 

Chernoff and Savage consider two samples as we have done in Section 2. They 
define m + n = N, \» = m/N and require that for all N the inequalities 


0<rASAvwS1—2r< 1 


hold for some \» S 4. Sample c.d-f.’s are defined: 
F,,(x) = (number of X; S z)/m, G,(x) = (number of Y; S z)/n. 


The combined sample c.d.f. is Hy(z) = AwF (2) + (1 — Aw)G, (x); the com- 
bined population ¢.d.f. is H(z) = AwF(x) + (1 — Aw)G(2). A statistic Ty is 
defined in two equivalent ways. Firstly, 


(27) ?,% f° IvlHw(x)] dF.,(2), 


where Jy need be defined only at 1/N, 2/N, --- , N/N but may have its domain 





RANK-SUM TESTS FOR DISPERSIONS 1183 





of definition extended to (0, 1) by a suitable convention. Secondly, 
N 
(28) mTy = 2 EyZyi, 


where the Ey, are given numbers and Zy; = 1 if Z; is an X and Zy; = 0 other- 
wise. The theorem, subject to four conditions, states that 





(29) lim P{(Ty — ux)/or 5) = / oe" ae 
Noo —o 


V 20 


with uy and cy given in terms of quantities here defined. 
Details on the application of the theorem are given in [i] but omitted here 
for brevity. W and T'y are associated with 


The association follows when we define 


(31) InlHe(2)] = 4 + shy — | 4+ gy — Hal) 

and 

(32) 29 Fai +144 be dh ai .. £580. 
2N- | 2N N| 


The four conditions of the theorem may be checked except that the fourth does 
not hold when H = 3, a point of measure zero and an exception permitted when 
the proof of the theorem is reviewed. 

We may evaluate uy and oy under Hy where F(z) = G(x) and obtain uy = } 
and cy = n/(48 mN), results asymptotically equivalent to (12) or (19) and 
(16) or (20) respectively. In practice, in applying the limiting normal distribu- 
tion under H, , we recommend the normal approximation outlined in Section 4. 

The establishment of the limiting normality of W under H, in (1) and under 
alternatives with @ ~ 1 is required in the following discussions of relative effi- 
ciencies. 


6. Consistency of the W-Test. Consistency of the W-Test of (1), 
Hy: 6=1, F(u) = G(u), 


against alternatives, H,: F(@u) = G(u), 6 = 1, is indicated by the Chernoff- 
Savage Theorem. When H, is true, Ty = } since oy = n/(48 mN) — 0 as 
m,n — © in constant ratio, or when dy is bounded as required by the theorem. 
When H, is true, it can be seen from the theorem that oy — 0 and Ty = uy 
with 


(33) luv — | = (1 — dy) [| Fle) — Flor) | dP (2) > 0, 61, 


the last result depending on a zero median, F(0) = 4. The test based on T'y is 








1184 A. R. ANSARI AND R. A. BRADLEY 


consistent and consequently the equivalent W-test is consistent against the 
alternatives indicated. 


7. Relative Efficiencies. The relative efficiency of the W-test in comparison 
with other tests of dispersion may be obtained following the method of Pitman 
and Noether [10]. Local alternatives are considered and we define 


(33) Ov = 1+ 7/VN, 


with N = m + n, 6y replacing @ and now dependent on N as required by Noether. 
The efficacy E » of the W-test, or equivalently of the T'y-test, is required and is 


(34) Ew = (dE(Tw | 0)/d0 | onal’ /ow( 0) |oms « 


Ew is evaluated through placement of n/48mN in (34) for the denominator 
and through differentiation of uy with respect to @ which, with our definition of 
Tw , has the special form 


(35) E(Ty\0) = wy = [3 — |} — Aw F(x) — (1 — Aw) FP (Ox) | dF(z). 


We differentiate under the integral sign and obtain 


(86) dE(Py|0)/ad aan = (1 — a») | f° afta) de — [° afte) ae], 


where f(z) is the density function associated with F(z). Now 


(37) a. ame lf af*(z) de — r. af*(z) az] 


when Ay is replaced by m/N. 

Ew in (37) is correct except for terms 0(1/N), for the derivation was based 
on asymptotic results from the Chernoff-Savage Theorem. We have also ob- 
tained Ew more directly but the derivation is lengthy and will not be given in 
detail. We refer to the definition of W in (6) and consider, for illustration, the 
case with m + n even. The random variables in (6) are the 4; . Let X;.; be the 
ath smallest X and then 


min.( i,m) 


P( = 1) DX P(Z; = 2a)) 


a=max.(li-—n) 


D> [mint/{ (ae — 1)!(m — a) (i — a) (mn — 1 + @)!}] 


[ett — Fee) Rox) f= For)" fl) ae. 


From (38) and (6) expressions for E(W | 6) and dE(W | 6@)/dé@ \s.1 may be 
obtained and reduced, the final reduction based on an interesting application of 
the method of steepest descent in the evaluation of integrals. We used the form 
for ow in (16) and obtained a result asymptotically equivalent to (37). A form 
similar to (38) was also obtained with m + n odd. 





RANK-SUM TESTS FOR DISPERSIONS 1185 


Noether set forth four conditions for the validity of the calculation of asymp- 
totic relative efficiencies. The first three are easily checked for the W-test and 
the fourth involves uniform limiting normality which follows from the Chernoff- 
Savage Theorem. 

The efficacy of the F-test for variances is 


(39) Er, = 4mn/|(m + n)(B2 — 1)) 


where 
(40) B, = f- [x — E(zx)]' ar(z) /| [ i B(2)|" ar(z) | 


as described by Sukhatme [13]. The relative efficiency of the W-test to the 
F-test is ¢wr = limy.. Ew/E, and reduces to 


0 

(41) ewr = 126-1) f af (2) de ~ [ af'(z) as] . 
Special cases follow: 

(i). If f(z) = €*"/4/2n, ewer = 6/x° = 61. 

(ii). If f(z) = 1,-4 Sz Sh, ewer = .60. 

(iii). If f(z) = 4e7'*', ewe = .94. 

8. Other Tests of Dispersion. The W-test has the same relative efficiencies as 
Sukhatme’s first test [13], as might be expected from their similarity. 

Sukhatme [14] has proposed a second test. The statistic is 


$= > OX, XY +2D LANNY) 


tl jm] 


(42) jptk tj 
. 2 n m 
+778 2% & K(X, ¥), 
where 


Q(u, v, w) 1 if O<u<cw0<v0<w or weu<cd,werv <9, 
0 otherwise 


and 
K(u,v) = 1 if O<u<v or o<u<J, 
0 otherwise. 


The relative efficiency es, is 


© 0 
UD): anem ~ (8; — 1) E fi 2F(2)f*(2) dz — [ af*(z) as}. 


In the normal case (i), ésy = .69; in the uniform case (ii), ¢s7 = .80; in the 





1186 A. R. ANSARI AND R. A. BRADLEY 


double exponential case (iii), esr = 1.03. Sukhatme’s S-test requires knowledge 
of the locations of the two populations. 
Mood [8] has proposed the statistic 


(44) M = > @ a mintty 


where r; is the rank of X, in (2). The relative efficiency of Mood’s test, as de- 
rived in [13], is 


(45) €ur = 45(8. — 1) E [ rF(x)f*(x) dx — [ af*(z) as. 


éur has values (i) .76, (ii) 1, and (iii) 1.08 respectively for the three distribu- 
tions considered. Mood’s test requires only knowledge of the relative locations 
of the two populations. 

Lehmann ([7], pp. 173) has proposed a test that does not depend on knowledge 
of even the relative locations of the two populations but even the null distribu- 
tion of his statistic is not distribution-free. The statistic, in a form given by 
Sukhatme [13], is 


(46) Leo ETE = Bry ri) / (3) (9) 
<<j hak 2 2 


where ¢(u, v) = lif u < vand ¢(u,v) = 0 otherwise. Relative efficiencies are 
not known since difficulties are introduced because of the dependency of the 
test on the natures of the populations sampled. 

The properties of the David and Barton test are those of the W-test in the 
special case in which the two are equivalent. 

Relative efficiencies have been shown for tests discussed in comparison with 
the F-test for variances. The relative efficiency of one rank test to another may 
be obtained from the ratio of the two relative efficiencies given. 

The W-test is an improvement on Sukhatme’s first test but is less efficient, 
though easier to use, than Sukhatme’s second test. Mood’s test is the most 
natural one against the background of normal-theory statistics and its efficiencies 
are the best. The W-test is somewhat easier to compute and with tables may be 
useful in many situations where a quick and easy test is desired. 


9. Discussion. It is a disadvantage in the W-test that the relative locations 
of the two populations must be known. This disadvantage—or more serious 
ones—is also present for the other tests discussed. If the X- and Y-samples cannot 
be adjusted so that ux — wy = 0, differences in locations seriously affect all of 
the tests of dispersion. 

We would like to modify the W-test so that the X- and Y-samples are ad- 
justed in locations on the basis of information from the sample itself. One possi- 
bility is to consider the sample medians X and Y, the middle or averages of 
middle observations for odd- or even-size samples respectively, and let U; = 





RANK-SUM TESTS FOR DISPERSIONS 1187 


TABLE 3 
The Null Cumulative Distribution of W for m = n = 9 from a Sampling Study and 
Corresponding Expected Frequencies for W Computed When m = n = 8 


| j j 
w n | a= |» || » | 2 | wo | w | os % 


j 
i 


— —— | | —— | - — 


Observed Cum. Freq.) 5 7 12 14 7 |w | 36 | 46 55 63 
Frequency for W 3.6 | 5.7 | 8.7 | 12.6 {| 17.5 23.5 | 30.3 37.9 | 45.9; 54.1 


w 37 38 BY o;lgii @ 43 “4 _ 48 
Sot Se Lae Pe 


Observed Cum. Freq| 69 | 80 | 84 |87 |o |95 |97 | 99 | — | 100 
Frequency for W 62.1 69.7 | 76.5 | 82.4 87.4 | 91.3 | 94.3 9.4) - 99.7 


X, — X and V; = Y; — Y. A new array like (2) may be formed from the 
samples of U and V. We would like to again compute the W-statistic for this 
new array and refer to the test based on it as the W-test. But the distributions 
of W are unknown and very difficult to investigate since the U’s and V’s are 
not independent. It does appear that the W-test should have the same 
asymptotic properties as the W-test, but this has not been verified. 

We would like to use the W-test as though it were the W-test. The appropriate 
way to do this seems to be to drop out the zero-values of U and V when they 
occur and to then proceed with the W-test on the reduced samples of U and V 
A check on the appropriateness of this was made by a sampling study. One 
hundred pairs of samples with m = n = 9 were taken from a table of random 
normal deviates and reduced to samples of U’s and V’s of eight each; the dis- 
tribution of W obtained is shown in Table 3. The mean and variance of W from 
the sampling study were respectively 35.3 and 19.7. The corresponding values 
for a W-test with m = n = 8 obtained from (12) and (16) are 36 and 22.5. 
This suggests that the normal approximation for the W-test may be used for 
the W-test, but this limited study is not conclusive. 

Other generalizations may be possible and merit investigation. The W-test 
may be extended to several samples for a rank analogue to a test of homogeneity 
of variances. Problems associated with the largest or smallest scale parameter 
of a set of populations might be considered. 

The problem of ties has not been investigated. The effects of ties could be 
studied, but we suggest that the usual procedure of giving a tied rank the average 
rank for the set of tied values should be adequate. 

We have chosen to consider the test of Hy: F(u) = G(u) against aiternatives 
F(@u) = G(u), 6 # 1 given that X- and Y-populations have, or may be adjusted 
to have, a common, but not necessarily known, location as measured by their 
medians. It is interesting to consider what may happen if other alternatives are 
met. Firstly, we note that the proof of consistency of the W-test in Section 6 is 
dependent on a common median (taken to be zero without loss of generality) 
for the two populations but that otherwise (33) applies for any G(z) # F(z) 





1188 A. R. ANSARI AND R. A. BRADLEY 


replacing F(@xr) in (33). Hence the W-test is consistent against a much wider 
class of alternatives although power should be best for the situation considered 
and this seems to be the important one. The W-test can lose its sensitivity for 
detecting differences between dispersions in the presence of differences between 
medians. If the difference between population medians is large and dispersions 
are relatively small, it can happen that all X-observations precede all Y-observa- 
tions in (2). If m = n also, then W = E(W) and Hy would not be rejected 
even if there are differences in dispersions. It is important, as stated in [5] and 
similarly in [12], that in general a rejection of Hy may be attributed to differences 
in dispersions. There is one exception: if m is very small compared to n, a very 
small value of W could be due to a difference in medians but this should be im- 
mediately apparent from an inspection of the data. 

A concluding remark may be made. There is some interest in possible forms 
that statistics may take. Sukhatme’s statistic [13] may be adjusted to estimate 
the probability P(| X | < | Y |). The situation is not so clear for the W-test. 
We note that W — Min. W is a count of the numbers of X’s nearer the com- 
bined sample median than Y’s. If we consider (W — Min. W)/mn, asymp- 
totically we have an estimate of P(| X — u| <| Y — uw!) when yu is the com- 
mon median of the two populations. This asymptotic result does suggest that 
the W-test will be consistent against alternatives for which 


P(\X —w| <|¥ —w|) 4}. 
10. Acknowledgments. We are pleased to acknowledge the assistance given 


us by John E. Freund through his original proposal of the problem and his 
work with A. R. Ansari in [5]. R. E. Bargmann has also been most helpful 
through discussions on various points included in the paper. Discussions with 
I. R. Savage at the 1958 Summer Statistical Institute‘ were particularly stimu- 
lating. 


REFERENCES 

{1] Ansart, A. R., Two Way Rank-Sum Tests for Variances, Ph.D. Thesis, Virginia Poly- 
technic Institute Library, Blacksburg, Virginia, (1959). 

[2] Barton, D. E., “The limiting distribution of Kamat’s test .atistic,’’ Biometrika, 
Vol. 43 (1956), pp. 386-387. 

[3] Cuernorr, HerMAN ANp Savaae, I. Ricuarp, ‘Asymptotic normality and efficiency 
of certain non-parametric test statistics,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 
972-994. 

[4] Davip, FLorence N. anv Barton, D. E., ‘‘A test for birth-order effects,’’ Ann. Human 
Genetics, Vol. 22 (1958), pp. 250-257. 

[5] Freunp, J. E. ano Ansart, A. R., Two-way Rank-Sum Tests for Variances, Technical 
Report No. 34 to Office of Ordnance Research and National Science Foundation, 
Virginia Polytechnic Institute, August (1957). 

[6] Kamat, A. R., “A two-sample distribution-free test,’’ Biometrika, Vol. 43 (1956), pp. 
377-385. 


4A. R. Ansari wishes to acknowledge a grant from the National Science Foundation 
allowing him to participate in the Institute. 





RANK-SUM TESTS FOR DISPERSIONS 1189 


[7] Leumann, E. L., “Consistency and unbiasedness of certain nonparametric tests,’’ 
Ann. Math. Stat., Vol. 22 (1951), pp. 165-179. 

[8] Moon, A. M., “On the asymptotic efficiency of certain non-parametric two-sample 
tests,”’ Ann. Math. Stat., Vol. 25 (1954), pp. 514-522. 

[9] Pearson, E. 8. anp Hartuey, H. O., Biometrika Tables for Statisticians, Vol. 1, Cam- 
bridge University Press, Cambridge, (1956). 

{10} Noeruer, Gorrrriep E., ‘On a theorem of Pitman,”’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 64-68. 

{11] Rosensaum, 8., Tables for a non-parametric test of dispersion, Ann. Math. Stat., 
Vol. 24 (1953), pp. 663-668. 

[12] Stzce., Stpney anp Tuxey, Jonn W., “A nonparametric sum of ranks procedure 
for relative spread in unpaired samples,” J. Amer. Stat. Assn., Vol. 55 (1960), 
pp. 429-445. 

[13] Suxuatme, Batkrisuna V., “On certain two-sample non-parametric tests for var- 
iances,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 188-194. 

[14] Suxuatme, Batxrisuna V., “A two-sample distribution-free test for variances,” 
Biometrika, Vol. 45 (1958), pp. 544-548. 

[15] Terry, Mitton E., Some Rank-Order Tests which are Most Powerful Against Specific 
Parametric Alternatives, Ph.D. Thesis, University of North Carolina Library, 
Chapel Hill, North Carolina, (1951). 





A RELATIONSHIP BETWEEN HODGES’ BIVARIATE SIGN 
TEST AND A NON-PARAMETRIC TEST OF DANIELS' 


By Bruce M. Hii 


Stanford University 


The null distribution of the statistic K of Hodges’ bivariate sign test ({2] 
and [3]) is the same as the null distribution of the statistic m proposed by Daniels 
[1] to test the hypothesis: Median {Y | z} = a, + 8.x, where ap and By» are given. 
For suppose we consider a sequence S,, S,,--- , S, , where each S, is either 
+1 or —1. We suy that two such sequences agree 7 times if there exist exactly 7 
places at which the sequences agree. Let & ,k = 0,1, --- ,m — 1, be the number 
of agreements of the sequence S, , --- , S, with the sequence whose first n — k 
values are +1, and whose last k values are —1 (called the kth sequence). Let 
tn4c, it = 0,---, nm — 1, be the number of agreements of S,,---, S, with 
the sequence obtained by changing each sign of the ith sequence. Clearly t,.; = 
n-—t;, t=0,---, n—1, and 


1 + t, if S,+ = ] 
hint = 


, k=0,---,n—1. 
—-1+4ifS.=+41 


Now envisage the sequence S, ,--- , S, placed in order reading from left to 
right at equal intervals on the upper half of a circle (S, and S, being above the 
horizontal diameter), and the value —S, placed at the point on the circle dia- 
metrically opposed to S, ,k = 1, --- ,n. Such an arrangement is sketched below 
for the case n = 5. 

Let P, ,k = 0,--+ ,2n — 1, be the number of positive S; lying on the upper 
semi-circle after k steps of a clockwise rotation have been taken (at which time 
S, will occupy the position formerly occupied by S,4; , and S,_, will occupy the 
position formerly occupied by S,). Then clearly Py = tb, 


(Pe+i if S.4=-1 


Pigs = § : ’ 
lrA—-1 if Si4=+1 


k=0,---,n—1, 


and P,,,; =n — P;, i= 0, --- ,n— 1. Since the & also satisfy these last 
two relationships, it follows that & = P,, k = 0,--- , 2n — 1. Hence m = 
n — Max t; = n — Max P; = K, where both maxima are over i = 0, 1, ---, 
2n — 1. Since each sequence S,,--- , S, has probability 1/2" under the null 
hypothesis of both Daniels’ and Hodges’ tests, and since m and K depend only 
on the observed sequence S,, --- , S, , it follows that the null distributions of 
m and K are identical. 


Received February 8, 1960; revised July 25, 1960. 
1 This report was supported in part by a Public Health Service Training Grant in Biom- 
etry. 


1190 





HODGES’ AND DANIELS’ TESTS 


Results for Daniels’ m may thus be applied to Hodges’ K, and vice versa. 
For example, we can apply Daniels’ approximation (under the null hypothesis), 


Prim > m} — 4(n — 2m.)n*3- (2x) exp [—4(2j + 1)*(n — 2m,)*/n}, 


to Hodges’ K. Also we have K = m S [(n — 1)/2], whereas Hodges only shows 
K s n/2. 

Hodges’ restriction to K < n/3 [2] seems to have no relevance to Daniels’ 
problem, and Klotz [3] has already obtained the null distribution of Hodges’ test 
with the restriction removed. Each of the three authors includes a null distribu- 
tion table in his paper, and these tables agree, that of Klotz being most complete. 

Daniels is able to obtain the power of his test only in the case where the al- 
ternative line is parallel to the null hypothesis line, while Hodges does not con- 
sider the power function. The alternatives for the Hodges test which correspond 
to the parallel line alternatives for the Daniels test satisfy (in Hodges’ notation) 


/ 
P. , =~ Be 9 w= 


* = 8, (2; — a) + (ys — ys)’ = a\ = P(B,d) = P, 
where—-~ S68#05+%,0<d<+,i = 1,---,n. Letting 9:(f, 9) be 
the density function of (1, — x, = yi), we must then have 
gilt, n)/lglé, 2) + 9 —E, —2)] = P 
or 
(1 — P)gilt, 9) = Pod -—t, —n), fori =1,---,n, 9» >O0, and all €. 


The conditional power of Hodges’ test against those alternatives for which 
(1 — P)gt,n) = Pg —t, —29),i = 1,--- ,n, 9 > O, is then given by Daniels’ 





1192 BRUCE M. HILL 


approximation [1], 


PriK < m,} = Prim < m,} 


= 1-2 PM [b( (25 + 1)ze + uw) — ®((2j — 1)% + w)) 


3 


+ Qe "hehe a oP} (2yty AS ersten? 
j=0 


where 
Ze = (n — 2m,)n™, 
Prie; > a — a} = P; = P = (1 — un™)/2, 
Q: = Q = (1 + un"*)/2. 


Here Prim > m,} is Daniels’ probability of rejecting the null hypothesis that 
the true line is a + 8,r when in fact the true line is a + 8,z, and the rejection 
criterion is m > m,. 

The above class of alternatives for the Hodges’ test is rather restrictive 
(g:(¢, ») must be discontinuous at the origin, i = 1, --- , n) and does not seem 
particularly interesting. In the general case, with 

gi(&, 0) 2 en a Rieti oéh 
md taea-e Nh ees 
the conditional power of Hodges’ test, given that (2x; — 2%, yi — ys) = (&, 2), 
(with 0 > m/f > m2/f& > +++ > 95/&;,0 < mn/En < tna/Enaa <- ++ < ms4a/Ejai 
for some j andi = 1, --- , n), is then 


n--ko—1 


Pr{K = ke} =1- > P,(t, ke), 


=ko+1 
where 


P,(t, ke) 
= Prik,<<tta<n—k,i=1,2,---,n-—1ltt+o =n — tj, 
and w; takes on the values +1 or —1 with probabilities 
P&, 1) and 1 — P,&, 1), 
respectively. 
I wish to express my thanks to Professor Lincoln E. Moses for suggesting the 
relationship between the two statistics. 


REFERENCES 


{1] H. E. Danrexs, “A distribution-free test for regression parameters,’’ Ann. Math. Stat., 
Vol. 25 (1954), pp. 499-513. 

(2] J. L. Hopaes, Jr., ‘A bivariate sign test,’”” Ann. Math. Stat., Vol. 26 (1955), pp. 523-527. 

(3) Jerome Kiovz, ‘‘Null distribution of the Hodges bivariate sign test,’’ Ann. Math. Siat., 
Vol. 30 (1959), pp. 1029-1033. 





MINIMAX SEQUENTIAL TESTS OF SOME COMPOSITE 
HYPOTHESES’ 


By M. H. DeGroor 


Carnegie Institute of Technology 
1. Introduction and summary. Let X(/), t 2 0, be a Wiener process with un- 
known mean yw per unit time and unit variance per unit time. Thus, X(0) = 0 
and for any & > t 2 0, X(t) — X(t) is normally distributed with mean 
(t2 — t)» and variance & — 4, . Furthermore, for any sequence 


OStu<tya Stn < ip S++ Stu < ln, 


the random variables X(t) — X(t), 7 = 1,---, &, are independent. 

The process may be observed continuously beginning at t = 0 and the problem 
is to decide between the hypotheses that 1 S uo and uw > wo, where po is a given 
number, which without loss of generality is taken as 0. Thus the hypotheses 
are 


Hy: ns 0 
Ay: n> 0. 


It is assumed that the cost of observing the process for a time ¢ is bt, where 
b > 0, and that W,(u), the cost of accepting H,(i = 0, 1) when u is the true 
mean, is of the form 


(1.1) 


ra 0 fory 3 0 
Welw) e for» > 0 


30 [ul for » <0 
Clip Or p 
Wil») = ‘6 for u > 0 
where c > Oand0 <r S 2. 

The main result of this paper is that under these conditions the minimax 
decision procedure is a certain sequential probability ratio test (SPRT). The 
reason for restricting r to the interval 0 < r S 2 will be brought out in the 
derivation given in Section 3. 

In Section 6, the analogous problem of testing the hypotheses (1.1) about the 
mean of a normal distribution is considered. The minimax procedure found for 
the Wiener process provides, in an obvious fashion, an approximation to the 
minimax procedure for this problem. Approximations of this type have been 
discussed in the literature. For r = 1, Moriguti [10] and Maurice [9] have found 
the approximate minimax procedure in a certain class of symmetric SPRT’s. 
The same procedure is mentioned by Johnsen in the discussion following [8]. 


Received January 12, 1960. 


1 This research was supported by the National Science Foundation under grant NSF- 
G9662. 


1193 





1194 M. H. DE GROOT 


Breakwell [2], [3], [4], has treated similar problems for the binomial and Poisson 
distributions. The work to be presented here not only puts all of this on a rigorous 
basis for the Wiener process but shows that, for the Wiener process, the minimax 
SPRT is in fact minimax among all decision procedures. 

Finally, it is shown in Section 6 that if the cost per observation is large, the 
true minimax procedure for the normal decision problem is to take exactly one 
observation and then accept one of the hypotheses. 


2. Loss functions and symmetric SPRT’s. In this section I begin the discus- 
sion of the decision problem for the Wiener process. For any decision procedure 
5 let P:(u, 5) denote the probability of accepting H;, i = 0, 1, when yu is the 
true mean and let T'(u, 6) denote the expected total observation time when yu is 
the true mean. Then the loss function for the decision procedure 6 is 


1 
(2.1) L(u, 6) = X W(u)Pi(u, 6) + bT(u, 8), 


and it follows from (1.2) that this can be written as 
. fe|m\" Pi(u,d) + bT(u, 8) forn <0 
(2.2) L(u, 8) = le uw Po(u,d) + bT (u, 8) for u > 0. 
The problem is to find a decision procedure 6*, if one exists, such that 
(2.3) max, L(y, 6*) = min; max, L(y, 6). 


Of special importance is the class of symmetric SPRT’s. A decision procedure 
belongs to this class if it satisfies the following conditions: (i) there exists a 
positive constant A such that the process is observed as long as | X(t)| < h/2; 
(ii) if at some ¢t, | X(t)| 2 A/2, then observation stops and either Hy or H, is 
accepted, according as X(t) S —h/2 or X(t) 2 h/2. 

The decision procedures 5, belonging to this class are conveniently indexed 
by the positive constant Ah mentioned in the definition. 

As is well-known [7], 


Po(u, ds) = 1/(e™ + 1), 
(2.4) Py(u, 8s) = 1 — Po(u, 8), 
T(u, dx) = h(e™ — 1)/[Qu(e” + 1)]. 


The singularity of T(u, 6,) at « = O is removable and it is easily seen that 
T(0, 5) = h’/4. 
Substituting these expressions in (2.2) gives, for any h > 0, 
[ cu” , bh(e* — 1) 
(2.5) L(u,d) = (+1 2ule* + 1) 


L(— w, ds) for» < 0. 


foru = 0 


3. The minimax symmetric SPRT. In this section, values h = h* and wu = u* 





MINIMAX SEQUENTIAL TESTS 


will be found such that 
(3.1) L(u*, bane) = min, L(p*, 5) = max, L(u, dae). 


Thus, 4,+ will be the minimax decision procedure in the class of all symmetric 
SPRT’s. 


It is convenient to make the following transformation of variables: 
h - (¥1e) 917", = (be) 9? "mm, 
£(m, 9) = (Be) "L(u(m), dx). 


Clearly, if values 7 = 7* and m = m* can be found such that 


(3.2) 


(3.3) £(m*, n*) = min, £(m*, 7) = max,, £(m, n*), 


then the corresponding values of h and y will satisfy (3.1). 
Making the substitutions (3.2) in (2.5) gives, for any 7 > 0, 


r 


we" —1) 
(3.4) L£(min) =< eM +1 ame + 1) 


£(—m, n) form < 0. 


The convenience of the substitutions (3.2) is seen in the elimination of the 
constants b and c from (3.4). Furthermore, the symmetry exhibited in (3.4) 
makes it possible to restrict the search for values m* and * that satisfy (3.3) 
to the region m 2 0. Finally, it should be remembered that £(m, 9) is, for each 
fixed » > 0, continuous at m = 6. 

Now fix 7 > 0. An elementary computation yields, for m > 0, 


form 2 0 


dL(m, )/dm So = 


(3.5) r(1+e°™) — mn = n’**[sinh (mn)/((mq)'™) — (1/(mn)")] @ 


rl +") — ma = a *{((mn)?"/3!) + ((mn)/51) + +++ J. 


Denote the left-hand and right-hand sides of the final inequality in (3.5) by ® 
and ¥, respectively. Then ® is a strictly decreasing function of m and 


(3.6) lim = 2r, limé = —o, 


m0 mon 
For 0 < r < 2, ¥ isa strictly increasing function of m and 


(3.7) lim ¥ = 0, limWv = o, 

mol mon 
Hence, for each fixed 7 > 0, there exists a unique positive value of m at which 
# = W and this value yields max,, £(m, n). 


Now consider the case when r = 2. Again, ¥ is a strictly increasing function 





1196 


of m, but now 


(3.8) lim W = 7/6, lim¥ = o. 

A glance at (3.6) shows that if n‘ < 24 there will be a unique positive value 

of m at which @ = W and this value will again yield max,, £(m, 7). However, 

if n' = 24, then max,, £(m, ») occurs at m = 0. (It should be clear from (3.5) 

and this discussion why the values r > 2 are excluded from consideration. ) 
Similarly, for fixed m > 0, an easy computation gives 


82(m, 9)/An 0 € [sinh (mn)/mn] + 1S m'*/n 


(3.9) 


@2+ [(mn)*/3!] + [(mn)*/5!) bane = m'**/9. 


It is clear from the final inequality in (3.9) that there is a unique positive value 
of 7 at which equality holds and this is the value that yields min, £(m, »), for 
each m > 0. 

It follows from this discussion that if positive values m = m* and 7 = n* can 
be found that simultaneously satisfy the equations 


(3.10) ‘i +") — my = (9'/m’) (sinh (mn)/(mn) — 1] 
[sinh (mn) /ma] + 1 = m™*/y 


and the added condition that when r = 2, n** < 24, then the values m* and 
n* will satisfy the minimax equation (3.3). 
Setting m = v/n in the second equation of (3.10) gives 


(3.11) q? = o'*/(v + sinh), 
and making these substitutions in the first equation of (3.10) yields 
(3.12) r(l +e") = 2vsinh v/(v + sinh v). 


A routine analysis shows that there is a unique positive value of v satisfying 
(3.12) and, consequently, there exist unique values m* > 0 and n* > 0 satisfying 
(3.10). It remains to show that when r = 2, n** < 24. From (3.11), it follows 
that it is sufficient to show that 


(3.13) 1/v* + (sinh v/v") > 


for all v > 0. An examination of the first few terms in the series expansion of 
the left-hand side of (3.13) shows that this inequality holds. 

Thus, the following result has been obtained. Let v* be the unique positive 
solution of (3.12). Let 


(3.14) n* 


r+2 


om v***/(y* + sinh v), m* = »*/n*. 





MINIMAX SEQUENTIAL TESTS 


Then m* and »* satisfy (3.3) and, hence, the values 
(3.15) h* = (bc) °? n®, u* ax (be*) 97? m*, 


satisfy the minimax equation (3.1). 


4. The minimax decision procedure. It will now be shown that the decision 
procedure 4,- derived in the preceding section as the minimax procedure in the 
class of symmetric SPRT’s, is in fact minimax in the class of all decision pro- 
cedures. 

Consider the problem of finding the decision procedure that is Bayes against 
the a priori distribution that places probability 4 at each of the two values 
uw = p* and wu = —4y*. That is, it is desired to find the procedure 6 that minimizes 


(4.1) p(d) = 3[L(u*, 8) + L(—p%, 5)}. 
It is well-known [7] that a Bayes solution for this problem is either a procedure 


that makes an outright decision without any observation of the process, or else 
it is a procedure of the following form. The process is observed as long as 


(/Se) “6 Oe 
(4.2) B< Jat) te ROOF 
where B <1 <A are constants, or equivalently, as long as 
(4.3) In B/(2u*) < X(t) < In A/(2y*). 


Observation stops and the appropriate hypothesis is accepted as soon as either 
inequality in (4.3) is broken. 

In the problem being considered here, the Bayes procedure cannot be an out- 
right decision. This follows from the faci that for any procedure 4, specifying an 
outright decision, 


(4.4) p(do) = cu*/2 = lim no L(u*, &) > L(u*, dre) = p( due). 


<A, 


Furthermore, it follows easily from the derivation given in [13] or [1] that, be- 
cause of the symmetry of the a priori distribution and the cost function, it must 
be true that In A = —In B in the Bayes procedure (4.3). (Expressed in other 
terms, if it is worthwhile to continue observation when the a posteriori probabil- 
ity that » = yu* is a then it must also be worthwhile to continue observation 
when the a posteriori probability that » = —y* is a, and conversely.) Thus, 
the Bayes procedure is a symmetric SPRT. It cbviously must be 4,- , since for 
any other symmetric SPRT, 4, , 


(4.5) p(d,) = L(u*, dn) > L(w*, dae) = pl dre). 


It may now be concluded that 5,- is minimax among all decision procedures. 
Indeed, for any decision procedure, 4, 


(4.6) p(s) = 3[L(u*, 6) + L(—u*, 8)) = wld) = L(u*, dae). 





1198 M. H. DE GROOT 


Hence, either L(u*, 5) 2 L(u*, d+) or L( —u*, 5) & L(u*, dye). Since L(u*, dye) 
is the maximum value of L(y, 5,-), the conclusion follows. 

In the preceding development, the fact that the minimax decision procedure 
belongs to the class of symmetric SPRT’s is a consequence of the existence of a 
pair, h* and y*, satisfying (3.1). When r > 2, it cannot be concluded that the 
minimax procedure is a symmetric SPRT because the existence of such a pair 
has not been established. 


5. Tests of hypotheses about the mean of a normal distribution. In this sec- 
tion I consider the analogous problem of testing hypotheses about the mean y» 
of a normal distribution with unit variance. Thus, suppose X,, X:,--- is a 
sequential sample of independent observations, each with this distribution. 
It is desired to decide between the hypotheses (1.1) when the cost per observa- 
tion is b and the cost of an incorrect decision is given by (1.2). 

The similarities between the problem treated in the preceding sections and 
the one now being considered are clear. The symmetric SPRT’s defined in Sec- 
tion 2 have obvious counterparts here, with X(t) replaced by > 7., X;. The 
expressions given in (2.4) are the usual approximations, [5], [7], [12], for the 
OC and ASN functions of these tests (where T(z, 5,) is now interpreted as the 
expected number of observations). Finally, the optimal property of the SPRT 
used in Section 4 is applicable to the problem now being considered [13]. 

It follows from these statements that the minimax procedure derived above 
for the Wiener process can serve as an “approximate” minimax procedure for 
the problem now being considered. However, it will now be shown that for suf- 


ficiently large values of b/c the actual minimax procedure is to take exactly 
one observation and then accept one of the hypotheses. 

A decision procedure is said to be a generalized SPRT if it is of the following 
type: there are given two sequences {a,} and {8,}, with 8, < a, (either may be 
infinite) for n = 1, 2, --- ; sampling continues as long as 


(5.1) Ba < >oter Xi < an; 


sampling stops and the appropriate hypothesis is accepted as soon as either in- 
equality is broken. 

It is known, {6}, [11], that the class of generalized SPRT’s is essentially com- 
plete relative to the class of all decision procedures with bounded loss functions. 
It should be noted that since Wo(u) and W,(u) are unbounded, any procedure 
with a bounded loss function must involve taking at least one observation. 

Let 5* be the decision procedure under which one observation, X, , is taken 
and either H) or H, , is accepted, according as X, < 0 or X; > 0. I will now show 
that 5* is minimax if b/c is sufficiently large. 

Clearly, 


(cu"®(—yw) + b forun 2 0 
(5.2) inmAaet r 


|Z —s, 5*) foru <0 








MINIMAX SEQUENTIAL TESTS 1199 


where 
(5.3) o(y) = [ (24) e-**" dz. 


Since L(y, 6*) = L(—y, 6*), the maximum value of L(y, 6*) occurs at two 
points, say wp = =o, with yo > 0. 

Now let 6 be any other generalized SPRT. If 0 S 8; S a then L(y, 6) 2 
L(uo, 6*), since the probability of making an incorrect decision on the first 
observation is at least as large using 6 as it is using 4*. Similarly, if 8, S a < 0 
then L(—po, 6) 2 L(—wo, &*). 

Finally, suppose that 8, < 0 < a and let Pr {8; < X; < a | wo} = & > O. 
Then 


L(wo, 6) = cuo Pr {Ace. Ho | wo, 8} + bE{n | po, 3} 
(5.4) > cuo Pr {Xi S Bi | wo, 8} + O[(1 — E) + 28] 
= cud?(B; — wo) + b+ bé. 


But 

(5.5) — = ®(a; — wo) — (8; — wo), 

and hence 

(5.6) (8; — wo) = O(a; — wo) — — > O(—mo) — &. 
Thus 

(5.7) Luo, 5) > cue(—wo) + b + E(b — cys). 


It follows that if b = cuo, then L(ywo, 6) > L(wo, 8*) and &* is minimax. 

It is interesting to note that when b 2 cys there is no least favorable a priori 
distribution. In fact, the Bayes procedure against the a priori distribution that 
places probability 4 at each of the values u = +,» is an outright decision. 


6. Acknowledgment. I wish to thank L. J. Savage, who originally suggested 
problems of this type to me while I was at the University of Chicago. 
REFERENCES 

[1] D. Brackwe.t anp M. A. Girsnick, Theory of Games and Statistical Decisions, John 
Wiley and Sons, New York, 1954, Chap. 10. 

{2} J. V. Breaxwe ut, ‘‘The problem of testing for the fraction of defectives,’’ J. Op. Res. 
Soc. Amer., Vol. 2 (1954), pp. 59-49. 

[3] J. V. BreaKwe ut, ‘‘Minimax test for the parameter of a Poisson process,’’ (abstract) 
Ann. Math. Stat., Vol. 26 (1955), p. 768. 

[4] J. V. Breakwe.t, “Economically optimum acceptance tests,’’ J. Amer. Stat. Asen., 
Vol. 51 (1956), pp. 243-256. 

(5) D. A. Daruine anv A. J. F. Srecert, ‘The first passage problem for a continuous 
Markov process,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 624-639. 

[6] M. H. De Groor, ‘‘The essential completeness of the class of generalized sequential 
probability ratio tests,’’ submitted to Ann. Math. Stat. 

















1200 M. H. DE GROOT 


[7] A. Dvonerzxy, J. Kizrer, ann J. Wo.irowirz, “‘Sequential decision problems for 
processes with continuous time parameter. Testing hypotheses,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 254-264. 

{8} P. M. Grunpy, M. J. R. Hearty, anv D. H. Regs, ‘‘Economic choice of the amount 
of experimentation,” J. Roy. Stat. Soc., Ser. B, Vol. 18 (1956), pp. 32-55. 

(9} R. J. Maurice, ‘‘A minimax procedure for choosing between two populations using 
sequential sampling,” J. Roy. Stat. Soc., Ser. B, Vol. 19 (1957), pp. 255-261. 

[10] 8S. Morreutr, “Notes on sampling inspection plans,’’ Rep. Stat. Appl. Res., JUSE, 
Vol. 3 (1955), pp. 1-23. 

{11] M. Sopet, “‘An essentially complete class of decision functions for certain standard 
sequential problems,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 319-337. 

{12} A. Wap, Sequential Analysis, John Wiley and Sons, New York, 1947, Chap. 7. 

{13] A. WaLp anv J. Wo.rowr7z, “Optimum character of the sequential probability ratio 
test,’”’ Ann. Math. Stat., Vol. 19 (1948), pp. 326-339. 





INVERSE BINOMIAL SAMPLING PLANS WHEN AN EXPONENTIAL 
DISTRIBUTION IS SAMPLED WITH CENSORING 


By Jacx NADLER 
Bell Telephone Laboratories, Inc. 


1. Introduction. Inverse binomial sampling plans are discussed in a number of 
papers in the literature, e.g., [3], [6], [7], [8], [13]. In addition, inverse sampling 
procedures have been proposed for some problems of drawing inferences from 
counted data about a parameter other than a population proportion, e.g., in 
[1] and [12]. In this note it is pointed out that inverse binomial sampling plans 
can be applied in certain situations where continuous observations must be 
taken with censoring. In particular, when the observations follow an exponential 
distribution and large values are censored, the use of such a sampling plan gives 
rise to a complete, sufficient statistic [10] that has a simple, well known distribu- 
tion. This result is closely related to the work of Malmquist [11]. 


2. The statistical model. Let X be a random variable with c.d_f. 
1 — exp (—2/@), i 
(1) F4(2) = agit oe 
0, ifz 3s 0, 
where 0 < @ < & is unknown. Suppose that observations are to be taken from 
this distribution with the restriction that for a known constant z, , the value 
of X can be observed if and only if X < z, ; otherwise, just the information that 
X > z, is provided. From a sample of observations with this sort of censoring 
(sometimes called ‘‘single censoring on the right’’), it is required that an inference 
be made about the value of @. 
To apply a binomial sampling plan to this situation, define the random variable 
aa & g * 


(2) U= 
0, f X>x. 


Clearly, U is a Bernoulli chance variable, 
(3) Pr{U = 1} = 1 — exp(—z,/@) = p, PiU =O} =1—p=4gq, 


that can be sampled only by taking observations on X subject to censoring. 
Thus, a sampling plan for the distribution of U specifies a procedure for sampling 
from the distribution of X with censoring. 

The proposed sampling plan is defined in terms of taking observations on U: 
For a given, positive integer r, a sequence of independent observations, say 
U,, Uz, +++, is sampled until for the first time }>7%_, U; = r. Then sampling 
is terminated. This inverse binomial sampling plan implies that independent ob- 
servations on X are taken sequentially until r values of X S z, ,say Y,;,---,Y,, 


Received November 28, 1959; revised May 6, 1960. 
1201 





1202 JACK NADLER 
are obtained. in the experiment the total number of observations N taken with 
censoring (including Y,,--- , Y,) is a random variable. 


3. A sufficient statistic. The method of maximum likelihood suggests the use 
of the statistic 


S = 2» Y;+ (N — T)Zo 


for the analysis of such an experiment. Some important properties of S are now 
derived. 

THEOREM: With the proposed sampling plan, S is a sufficient statistic and 28/6 
has a x’ distribution with 2r degrees of freedom. 

Proor: Regard the given experiment as r independent repetitions of an ex- 
periment in which observations are taken one at a time with censoring until a 
single value of X s z, is obtained. In the ith repetition (¢ = 1, --- , r), let Y; 
be the observed value of X and N; — 1, the number of censored observations 
immediately preceding Y;. (Each censored observation is known to be greater 
than z,.) Define 


(5) S; = Y; + (N; wen I )}Ze« 


Since, with probability one, 
(6) N,; = 1 + [S,/z-], Y; = 8S; — z.[S8i/z.], 


where [x] denotes the integral part of z, S, is sufficient for the joint distribution 
of Y; and N,;. > 

To determine the distribution of S;, note that there is no loss of generality 
in assuming that X is the waiting time up to a change of state in a homogeneous 
Poisson process, where changes occur at the mean rate of 1/@ per unit time. 
From this point of view, taking an observation with censoring has the interpreta- 
tion that starting from an arbitrary moment in time, the process is observed for 
an interval of min(X, z,) units of time. Drawing a sample of observations with 
censoring then amounts to observing a sequence of these time intervals that are 
disjoint, but not necessarily adjacent. Using the properties of a Poisson pro- 
cess, write 


P(t) = exp(—t/@) 


for the probability that no change occurs in an interval of length tf. 
For arbitrary z > 0, let m = [z/z,] and consider 


(8) Pri S; s xz} = PriS; s m} + Prim < 8S; s 2}. 


The event S; S m in terms of the Poisson process is the complement of the 
event that no change is observed in each of m non-overlapping intervals of length 
x,. Furthermore, m < S; S z is equivalent to the intersection of the event that 
no change is observed in m disjoint intervals with the event that at least one 








INVERSE BINOMIAL SAMPLING PLANS 1203 


change is observed in an additional interval of length z — mz, that does not 
overlap any of the previous m. Hence, 


(9) Pr} 8; s x} = l ve {Po(a.)}™ + {Po(z.)} {1 a P(x _ mz,)| 


1 — exp(—z2z/@). 


Therefore, S,; , --- , S, form a random sample from the exponential distribu- 
tion (1). Since S = }°{_, S,, the theorem follows from well-known properties 
of the I distribution (see e.g., Example 17.8 of [9])'. 

Coro.iary. The distribution of S belongs to a complete family of distributions. 

A proof of the corollary is supplied by Example 3.5 of [10]. 

It is not difficult to verify that the theorem does not hold for the analogue 
of S in the single sample case [2], i.e., when the total number of observations 
taken with censoring is fixed and the number of values of X S z, is a random 
variable. Indeed, the principal advantage of the inverse binomial sampling plan 
is that appropriate procedures are often well-known or easily derived. The com- 


pleteness property provides easy proofs of the optimal character of many of 
these procedures. 


4. Remarks. 

(i) It is not essential to the sampling plan that observations are taken one at 
a time. The results of the theorem are valid, for example, in the following multiple 
stage experiment: In the first stage, a sample of r independent observations on X 
is drawn. The experiment is terminated if r values of X S z, are observed; other- 
wise, the experiment is continued. At each successive stage a sample of observa- 
tions is taken whose size is the additional number of values of X S xz, required 
to make the cumulative number of such values (i.e., the number observed in 
the entire experiment) equal to at most r. Experimentation is terminated as soon 
as r values of X S z, are obtained. 

(ii) The “‘nice’”’ properties of S are shared by a statistic proposed by Epstein 
and Sobel [4], [5]. From the point of view of the experimental design, however, 
the restriction imposed on the observable values of X-in this paper differs im- 
portantly from the one considered by those authors. 

(iii) An inverse binomial sampling plan can, of course, be applied when X 
follows some distribution other than (1). Typically, results like the theorem above 
do not hold for such cases. At least from the point of view of maximum likelihood 
estimation, however, it is frequently advantageous to use such a sampling 
plan. 


5. Acknowledgments. The author is grateful for the helpful comments and 
criticism received in connection with this work from M. H. DeGroot, R. B. 
Murphy, and W. L. Roach, Jr. 





1M. Sobel, John W. Tukey, and the referee independently suggested that an earlier 
derivation of the distribution of S; might be replaced by one proceding along the lines pre- 
sented here. 








JACK NADLER 


REFERENCES 

[1] Dovetas G. Chapman, “Inverse, multiple and sequential sample censuses,’’ Bio- 
metrics, Vol. 8 (1952), pp. 286-306. 

(2) A. Cuirrorp Conen, Jr., “Maximum likelihood estimation of the dispersion param- 
eter of a chi-distributed radial error from truncated and censored samples with 
applications to target analysis,’ J. Amer. Stat. Assn., Vol. 50 (1954), pp. 1122- 
1135. 

[3] Morris H. DeGroot, “Unbiased sequential estimation for binomial populations,’’ 
Ann. Math. Stat., Vol. 30 (1959), pp. 80-101. 

[4] B. Epstein anv M. Sospen, “‘Life testing,” J. Amer. Stat. Assn., Vol. 48 (1953), pp. 
pp. 486-502. 

[5] B. Epstein anp M. Soper, “Some theorems relevant to life testing from an expo- 
nential distribution,’ Ann. Math. Stat., Vol. 25 (1954), pp. 373-381. 

(6) D. J. Finney, “On a method of estimating frequencies,’’ Biometrika, Vol. 36 (1949), 
pp. 233-234. 

(7] M. A. Grrsuick, F. Moste.ier, anv L. J. Savace, ‘‘Unbiased estimates for certain 
binomial sampling problems with applications,’’ Ann. Math. Stat., Vol. 17 (1946), 
pp. 13-23. 

{8} J. B. 8S. Hatpang, “On a method of estimating frequencies,’’ Biometrika, Vol. 33 
(1945), pp. 222-225. 

{9} Maurice G. Kenpaui, The Advanced Theory of Statistics, Vol. I1, 2nd Ed., Charles 
Griffin & Company Ltd., London, 1948. 

[10] E. L. LeumMann anv H. Scuerr®é, ‘‘Completeness, similar regions, and unbiased esti- 
mation—Part I,’’ Sankhyd, Vol. 10 (1950), pp. 305-340. 

{11] Sten Matmauist, “‘A statistical problem connected with the counting of radioactive 
particles,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 255-264. 

[12] Martin Sanpe.ivus, ‘An inverse sampling procedure for bacterial plate counts,”’ 
Biometrics, Vol. 6 (1950), pp. 291-292. 


{13} M. C. K. Tweepre, ‘Inverse statistical variates,’’ Nature, Vol. 155 (1945), p. 453. 





NOTES 


A CONSERVATIVE PROPERTY OF BINOMIAL TESTS' 
By H. A. Davip 
Virginia Polytechnic Institute 

Consider n independent binomial trials with common probability of suc- 
cuss +. We shall be concerned with the three binomial tests of the null hypothesis 
Ho: r = xo(0 < mo < 1) corresponding to the alternative hypotheses (i) ,//;: 
we > mo, (ii) Aay: e < mo, and (iii) sHy: rf # mm. 

There are many situations when the probability of success does in fact vary 
from trial to trial, being 2; for the ith trial (¢ = 1, 2, --- , n). One may then 
wish to test the modified null hypothesis Ho: . = 0, where x. is the mean 
of the x;. 

It is the purpose of this note to show that the ordinary tests of H» are con- 


servative tests of Hy. More precisely, letting S, denote the number of suc- 
cesses in n trials, we shall prove that the inequality 


(1) Pr (S, 2 a, | Ho) = Pr(S, = a, | Ho) 


holds for any integer a, such that nx» + 1 S a, S n. This result is relevant to 
case (i). At the ordinary levels of significance the fact that a, has to exceed the 
expected value of S, by at least one is no limitation. The corresponding result 
for case (ii) follows by symmetry, viz., 


(2) Pr (S, S b, | Ho) 2 Pr (S, Ss b, | Ho), 
where 0 S b, S nao — 1. Since (2) implies 
Pr (S, > b,| Ho) S Pr (S, > ba| Ho) 


it may be noted on taking a, = b, + 1 that the inequality in (1) is reversed 
if a, S nm. Adding (1) and (2) we obtain the inequality appropriate for the 
two-sided alternative (iii). 

These results are obtained in the course of an ingenious but complicated argu- 
ment by Hoeffding [2]. The proof given here may, however, be of interest in 
view of its relative simplicity. 

To prove (1) we proceed by induction. For n = 2, a; must equal 2 and 


P = Pr(S, = 2| Ho) = mm 


Received August 28, 1959; revised January 8, 1960. 


1 Research supported by Office of Ordnance Research, U. 8. Army Contract No. DA-36- 
034-ORD-1527 RD. 


1205 





1206 H. A. DAVID 


is a maximum for 7, = m. = m, i.e., under Hy. Suppose next that (1) is true 
for n — 1 trials. Then 
1 


(3) P = Pr(S, 2 a,|Ho) = dD Pr(Ss1 = aan — an) Pr (za), 


f z,=0 


where z, is the characteristic random variable describing the nth trial and 
taking the values 1 and 0 with probabilities x, and (1 — 7,), respectively. For 
simplicity of writing we omit showing the dependence of the right hand side of 
(3) on H}. With this understanding it follows that 

P (1 - Tn) Pr (S,-1 = a») + w,Pr (Sy-1 2a, — 1) 


Pr (Sy-1 2 Gn) + waPr (Sya-1 = a, — 1). 


Since a, > (n — 1)" + 1, we have by hypothesis that Pr (S,1 2 ax) is a 
maximum, for a given value of x, , if 


(4) ™ = 2 = +s = Ha = (nm — e)/(n — 1) = w* (say). 
P now takes the form 


: n—l a a= = 
P= 7 4 “ ') a*"(1 — «*)*”* + «, e a 2 et (1 — =. 


r=On 1 


and may be regarded as a function of z,, only, n, a, , ro being specified. We have 


PF [02 )eru ers ("era er 


+ c _ a a s*)" 


a, — 1 


2) ty gaye (272) eta ee] 


era — x*)" "(1 — w,) 
ro n—a on 2 *,~ n—a 
ne a a*" ‘dd ae *) a = oe >) ™,™*" *(1 oe x*) n 


- ene a, 
where 
F = —(n — ay)x*(1 — xn) + (n — 1)e*(1 — #*) — (a, — 1)m(1 — #*). 
By (4), ** = 1 gives nm = n — 1 + #,. But 
a,2nm+1=n+ 7, 


which leaves n as the only possible value of a, , so that ** = 1 does not lead 
to a zero of dP/dx, . The case x* = 0 is discussed below. 





PROPERTY OF BINOMIAL TESTS 1207 


Turning to the zeros of F we note that this is a quadratic in r, , viz., 
(n — 1)F = —n(a, — 40)(4, — neo — 1 + a,). 


The condition a, 2 nro + 1 ensures that the root +, = x9 corresponds to a 
local maximum of P, a continuous function of x,. The derivative dP/dr, 
vanishes also at x* = 0, ie., at x, = nm and, for a, = nx + 1, ater, = 0. 
Since x* 2 0 implies by (4) that x, S min (nz, 1) it follows that dP/dx, = 0 
at 7, = mo and possibly at extreme values of x, . Thus the local maximum of 
P must be a true maximum, so that by (4) P is a maximum for x; = mp (all 7), 
which proves (1). 

The question of what approximate corrections to make to the probabilities 
under Hy to obtain the corresponding probabilities under Ho has been con- 
sidered by Walsh [3]. He also points out the known results (Cramér [1]) that 
S, is asymptotically normal provided >", #(1 — 2;) diverges as n — ~. In 
this case, therefore, since modifying H» to H leaves the expectation of S, un- 
changed but reduces its variance, the above three results are to be expected in 
large samples. 

I am grateful to the Editor for drawing my attention to references [2] and [3}. 


REFERENCES 

[1] Haratp Cramé&r, Mathematical Methods of Statistics, Princeton University Press, Prince- 
ton, 1946, pp. 217-218. 

[2] Wasstty Hoerrp1na, ‘On the distribution of the number of successes in independent 
trials,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 713-721. 

[3] Joun E. Wausn, “Approximate probability values for observed number of ‘successes’ 
from statistically independent binomial events with unequal probabilities,” 
Sankhydé, Vol. 15 (1955), pp. 281-290. 





AN OPTIMUM PROPERTY OF REGULAR MAXIMUM 
LIKELIHOOD ESTIMATION 


By V. P. GopaMBE 
Science College, Nagpur, India 


Let z be a point of some abstract space %, on which is defined a measure yu. 
Further, let p(z, @) denote a probability density function with respect to u, 
which, but for an unknown parameter, @, is completely specified for all z ¢ &. 
It is taken for granted that @¢, a given index set. We make the following 
assumptions: 

(a) Q is an open interval of the real line. 

(b) For almost all x(n), (d log p(z, )/80) and (0° log p(z, @)/d60°) exist 
for all 6€Q. 

(c) J p(x, 0) du and f (8 log p(x, 0)/d0)p(z, @) du are differentiable under 
the integral sign. 

(d) E[(@ log p(x, @)/0)" | 6} > 9 fox all ce Q. 

To estimate 0, we construct a function, g(z, 0), on X& X @ such that 
(i) Elg(a, @)| 6] = O for all 6€Q, 

(ii) for almost ali z(u), dg/00 exists for all @ ¢ Q, 

(iii) f g(x, 0)p(x, @) du is differentiable under the integral sign, 

(iv) [E(dg/d0@ | 6))' > O for all 6 € Q. 

The condition (i) above is not much of a restriction, for, from any function, 
f(a, 6), such that, E[f(2, @)| 6] < «, we can construct a g(z, @) satisfying (i). 

Now, given such a function g, the procedure of estimating @, is as follows: 
If the observed value of z = X, an estimate of @ is given by @,(X), where the 
equation 


(1) g(X, 0) = 0, 


is satisfied by @ = 6,(X). 

For this reason, any function g satisfying conditions (i)—(iv) above, will be 
called a regular estimating function. Let S denote the class of all the regular 
estimating functions g. Now from © we propose to select some g*, which, in a 
certain sense is an optimum estimating function. 


Derinition: A g* belonging to © is said to be an optimum estimating func- 
tion if 


E(g*|@)  — E(g"|8) 


(2) [ (2 16) = ECD) for all ge S and 6 « Q. 


Of course there is certain amount of arbitrariness in the above definition of 
“optimum”. It is not however altogether unreasonable. We want the values of 


Received July 28, 1959; revised May 17, 1960. 
1208 





MAXIMUM LIKELIHOOD ESTIMATION 1209 


g to cluster around 0, as much as possible (i.e. E(g’ | @) should be as small as 
possible), and at the same time it is desirable that E[g(z, @ + 40)| 0] should be 
as far away from 0 as possible. (This is conveniently translated as 


[E(ag/a0 | 6)) 


should be as large as possible). This justifies the definition (2). The following 
Theorem establishes that g* = (0 log p) /0° satisfies (2) above. Hence follows 
the optimality of the maximum likelihood estimation. /t is important to note 
that the validity of the Theorem below is independent of the discussion of “op- 
timality’’ in this paragraph. 

Tueorem: For all g ¢ S, 


E@\9) l 
(A) : rag | 4 1(? log 2) | ' 
|# a \@ E —— |@ 
the equality being attained for 


+ _ Ilogp 
(B) g ~° 


Then, obviously, 


E(g*? | 6) < E(g? | 8) 


for all ge S. 
Proor: We have from (i), for 6¢Q, 


(3) J g(a, 0) plz, 6) du = 0. 
Differentiating (3) under the integral sign (this is permissible because of (a), 
(b), (ii) and (iii)), we have 


39 | oe . 
(4) [2 pau + I-59 pdp = 0. 


Further, because of (c), 
, (4 log p ) 

(5) ai sp 10) = 0. 
Thus second integral in (4) is the covariance of g and @ log p/d6. It then follows 
from (d), (iv) and the Cauchy-Schwarz inequality, that, for 6 ¢ Q, 

E(g¢ \@ 
= PYCIAY aa (DP 

Y _* ! 7 ‘tes - 

E (414) el ( a ) | 


1 It follows from (a)-(d) that 4 log p/d0 eS. 





1210 Vv. P. GODAMBE 


This proves part (A) of the Theorem. Of course, instead of (iv), when, 
E(dg/00 | 6) = 0, 
(A) is true trivially. Further, if 


“ + _ 9 log p 
(7) g 0 


we have from (a), (b) and (c) that 


[a log p | ; |( log 4) | 
( — y — | = y -_ ' 
(8) El | I - i. 


and hence 


/ *? | 
(9) ee ia. 8 


CoM TCM] 


This proves part (B) of the Theorem. 


It is interesting to note that the above theorem generalizes the Cramér-Rao 
inequality [1], [2]. For let 


(10) gi(z, 0) = f(x) — e,(8), 

where f is a function on &, and 

(11) e,(@) = E(f | 6). 

We assume g; ¢ S. Now if the above Theorem (A) is applied to g, , we get, 
Var (f | 6) Ge ee 

(de,(8) /00)? ~ El\(é log p/de)? | 6)” 


This is the Cramér-Rao inequality. Part (B) of the above Theorem is valid 
under general regularity conditions, while the sign of equality in (12) holds, 
when and only when, in addition to regularity conditions, 


(12) 


0 log pi x, 6) 
00 


(13) = (0)(f(x) — e,(@)), 
where \(@) is some function of @ alone (Cramér [1], p. 480; Fend [3]). Obviously, 
when the sign of equality holds in (12), g, in (10) is given by, 


8 log pz, 6) 


(14) MO)gi(2,6) = 30 


The author acknowledges with pleasure that G. A. Barnard communicated 
to the Royal Statistical Society, London, a result similar to the preceding 
Theorem, independently, and at nearly the same time when this paper was 
written. Barnard’s result is explained in [4], p. 145. 





MAXIMUM LIKELIHOOD ESTIMATION 


REFERENCES 

[1] Haratp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, N. J., 1954. 

[2] C. R. Rao, “Information and accuracy attainable in estimation of statistical param- 
eters,’’ Bull. Cal. Math. Soc., Vol. 37 (1945), pp. 81-91. 

[3] A. V. Fenn, “On the attainment of Cramér-Rao and Bhattacharyya bounds for the 
variance of an estimate,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 381-388. 

[4] J. Dursin, “Estimation of parameters in time series regression models,” J. Roy. Stat. 
Soc., Ser. B, Vol. 22 (1960), pp. 139-153. 





CORRECTION NOTES 


ACKNOWLEDGMENT OF PRIORITY 


By Wapre F. Minar. 
University of North Carolina 
It has been brought to my attention by the editor of the Calcutta Statistical 
Association Bulletin, that the result in my note, “An Inequality for Balanced 
Incomplete Block Desigus” (Ann. Math. Stat., Vol. 31 (1960), pp. 520-522) was 
obtained seven years back by Purnendu Mohon Roy in his paper, “‘Note on the 
Resolvability of Balanced Incomplete Block Designs,” (Calcutta Stat. Assn. Bull., 
Vol. 4, No. 15, October, 1952, p. 130), which was reviewed in Math. Rev., Vol. 14, 


No. 7, July-August 1953, p. 61. I wish to acknowledge the priority of Roy’s 
result. 


or 


CORRECTIONS TO 
“TRUNCATION AND TESTS OF HYPOTHESES” 


By Om P. AGGARWAL AND Inwin GuTTMAN 
McGiil University 
The authors are indebted to Prof. T. W. Anderson for calling our attention to 


an error in the above mentioned paper (Ann. Math. Stat., Vol. 30, [1959], p. 230). 
The expression (4.7) of that paper is wrong and should read 


(47 Reject H if X > K, (a, n) or if max? X; >a 
4) Accept H otherwise. 


The K, (a, n) will be the same as given on page 235 of the paper, since, under the 
hypothesis, Pr(max X; > a) = 0, but unfortunately Table III, which tabulates 
P {u), the power of test (4.7), should now be deleted. 





ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Stanford Annual Meeting of the Institute, 
August 23-26, 1960.) 


39. On the Number of Distinct Values in a Large Sample from an Infinite, Dis- 
crete Distribution. KR. R. Banapwur, Indian Statistical Institute (By title). 


Let A; , Ar, ++: be an infinite sequence of events defined on the sample space of some 
experiment such that P(A;) > 0 foreach j, P(A;Ax) = 0 forj # k, and DSP (Ay) = 1. Con- 
sider a sequence of independent repetitions of the experiment, and let 7’, be the number of 
distinet events A; observed in the first n trials. The paper studies the rate at which T, — « 
asn— «. Letyu, = E(T,), where E denotes expected value. Theorem 1: T,/u, — 1 in prob- 
ability as n — «. Suppose (with no loss in generality) that P(A;) 2 P(Aj;4:) for all j. Let 
f(z) = max {j:P(A;) 2 2} for z S P(A;) and f(z) = 0 (say) otherwise. Theorem 2: 
bn = nf e~™ f(z) dz + o(1) asn— ~. It follows, e.g., that if P(A;) = cj-*, where 1 < a 
< «, then uw. = I'(l — B)(en)*® — 6, + o(1), where 0 S 6, S 1 and 8 = 1/a; and that if 
P(A,) = e—//j!, whereO0 <A < ~, then, ~ log n/log log n. There is, however, no attain- 
able maximum or minimum rate of increase of Hn Theorem 3: Given P, there exist prob- 
ability distributions P* and P** such that, with pe = E(T,| P*) and Fag = E(T, |Ps*), 


un = 0(u,) and pw, = oun’) asn— @. 


40. Expansions for Convolutions. Reep Dawson, American Systems Inc. 


Asymptotic expansions for the ordinate and tail area in the distribution of the stand- 
ardized sum of a large number of independent and identically distributed random variabies 
are developed from the Edgeworth series of Cramér. The formula for the ordinate extends 
results of Daniels (Ann. Math. Stat., Vol. 25 (1954), pp. 631-650) and Good (Ann. Math. 
Stat., Vol. 28 (1957), pp. 861-881). The expansion of the tail area is believed to be new. 


41. On Sufficient Conditions for Consistent Parameter-Estimates in a Stochastic 
Difference Equation with Regression on Several Lagged and Non-Sto- 
chastic Variables. Frrepuetm Ercker, University of North Carolina. 
(By title). 


The least squares estimates a, of a; and b; of 8; in the stochastic difference equation 
Yo = aye + °*+ + apyt-p + Bidic + +-> + Bela tar, t = 1,2, --- , where yo ,y4+,-°--, 
Y-p+: and all z;, are given constants, are shown to be consistent if conditions (A)~—(C2) hold: 
(A) The disturbances ¢, are independent with 0 means, and 2nd and 4th moments bounded 
between two positive constants uniformly in t. (B) The roots of p? — aip?"! — --- ~ a, = 0 
are within the unit-circle (stationarity). With X(N, q) = (z;,) andd,(P) = largest eigen- 
value of P = X’X it is assumed further: (Cl) If the set {N| of those N for 
which NA, (P) 2 g(N), with some g(N) — 0 as N — @, is infinite, let 


(N + N[AV(P)] + A (P))~* Amin (K'K) — &, N in {N}, 


where K(N, (p + 1)q) = (X, LX, --- , L?X), L(N, N) containing 1’s in its left subdiagonal 
and 0 elsewhere. (C2) If the complement {N} to {N} is infinite, \me(P) — © suffices for 
ay > a i. Pp. on {N}. For b; — 8; on {N} suffices to have in addition that the elements 
of P-1X’L’X are uniformly bounded for all j. This theorem is proved using only elementary 
matrix theory. Simplified conditions are easily derived. They allow even exponentially in- 


1215 





1216 ABSTRACTS 


creasing regression vectors. This case is not included in what seem to be the broadest condi- 
tions known so far (Grenander, Ann. Math. Stat., 1954). The methods applied lead also to 
statements about asymptotic distributions, the explosive case and other questions. 


42. Multivariate Extremal Distributions. E. J. Gumpet, Columbia University. 
(By title). 


A bivariate probability function F(z, y) with margins F;(z) and F;(y) is obtained by 
writing [—log F (z, y)]™ equal to the sum of the corresponding expressions for F;(z) and 
F;(y) where m 2 1. It is shown that the asymptotic bivariate probability function of largest 
values #(z, y) taken from F(z, y) has the same form, provided that the marginal distribu- 
tions possess asymptotic extremal distributions. By analogy, a bivariate probability func- 
tion II (z, y) for values exceeding z and y is linked to#(z, y) by I (z, y) = #(—z, —y). These 
expressions can easily be generalized to n dimensions. 


43. Tolerance Regions. Irwin Gutrman, McGill University. (Invited Paper). 


A discussion of Distribution-Free and 8-expectation tolerance regions for fixed sample 
cases and sequential sampling schemes is given. Let X, , --- , X, be a sample of n inde- 
pendent observations on X. Definition. S(X,,--- , X,) is a distribution-free tolerance 
region if the induced probability distribution of the coverage of S is independent of the 
probability distribution of X. Definition. S(X;,---, Xn) is a B-expectation tolerance 
region if the expected value of its coverage is §. 

A connection between tolerance regions and the concept of Best Population (c.f., the 
work of Gupta, et al.) is indicated. Definition. Suppose there are k populations that are 
distributed by pe, i = 1,---, k respectively. Let C; = Sa dP, where A is fixed, and 
known in advance. Then the k populations are said to contain a best population if and only if 
there exists an ordering of the C; such that Cy; > Cu 2 --- = Cy). 


44. Estimation of the Scale Parameter in the Weibull Distribution by Means of 
a Life Test with Censoring both by Time and by Number of Failures. 
Eva@ene H. Leuman, Jr., North Carolina State College. (Introduced by 
R. L. Anderson). 


The distribution of life spans of certain classes of individuals is assumed to be Weibull 
with two parameters—a shape parameter assumed known and a scale parameter to be esti- 
mated. The maximum likelihood estimator of the scale parameter is derived in a test in 
which N items are subjected to test. The test continues until R (less than N) items have 
failed and a minimum time 7 has elapsed. The bias, small sample variance, mean square 
error, asymptotic mean square error, cost and price of the estimator are determined, where 
cost is defined as a linear combination of N and the expected duration of the test, and price 
is the product of cost and mean square error. By means of calculations on an IBM 650, 
actual values of these characteristics are determined for certain values of R, N, T and the 
shape parameter. 


(Abstracts of papers not presented at any meeting of the Institute.) 
1. On the Foundations of Statistical Inference, II. (Preliminary Report). ALLAN 
Brrnspaum, New York University. 


Let E = (ps;) be any stochastic matrix (>"; ps; = 1, for each i), i = 1, --- & (k finite), 
j = 1, +--+ m (possibly infinite). Then £ is the mathematical model of a statistical experi- 





ABSTRACTS 1217 


ment for k simple hypotheses: Prob (X = j | Hi) = ps; . For any fixed k, E is simple if it is 
(or is equivalent to) such a matrix with m = k. A simple experiment is cyclic-symmetric 
(c.s.) if it can be represented by a c.s. square matrix (p,.; = pi-s.s-1 , With any subscript 0 
replaced by k). Any experiment is called c.s. if it has a representation E = (p;;) 
= (Q, ,Q2,--- ) where each Q, is square c.s. Lemma 1: Each c.s. experiment is equivalent 
to a mixture of c.s. simple experiments. Lemma 2: Every experiment is a component of 
some c.s. experiment. Hence for typical purposes of informative inference, any outcome 
X = j of any experiment FE = (p,;) can and should be interpreted as an outcome of the es- 
sentially unique simple c.s. experiment having a éofumn proportional to the jth column of Z. 
The structure of Z is irrelevant to such interpretations except through its jth column, 
which is the likelihood function determined by outcome j. Such interpretations can be 
expressed exclusively, if desired, in terms of error-probabilities defined in the simple c.s. 
experiment and admitting frequency interpretations; such interpretations include point and 
confidence-set estimates. A formal correspondence exists between some such interpretations 
and inferences based on formally postulating equal prior probabilities; this gives a construc- 
tive explication of the traditional Bayesian “principle of indifference.’’ A formal corre- 
spondence exists also between such interpretations in confidence-set form and the state- 
ments obtainable by formal application of Fisher’s “‘fiducial argument’’ (which is possible 
in any simple c.s. experiment). 


2. Trees and Negative Estimates of Variance Components. W. A. THompson, 
Jnr., University of Delaware. (By title) 


This paper provides an algorithm for solving the problem of negative estimates of vari- 
ance components (see Abstract 69, Ann. Math. Stat., 1960) for all random effects models 
whose expected mean square column may be thought of as forming a mathematical tree in a 
certain sense. The algorithm is as follows. Consider the minimum mean square in the entire 
array; if this mean square is the root of the tree than equate it to its expectation. If the 
minimum mean square is not the root then pool it with its predecessor. In either case the 
problem is reduced to an identical one having one fewer variable and hence in a finite num- 
ber of steps the process will yield estimates of the variance components. These estimates are 
non-negative and have a maximum likelihood property. 








NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 
- Personal Items 


H.J. Arnold has accepted an appointment as Assistant Professor in the Mathe- 
matics Department of Wesleyan University, Middletown, Connecticut, having 
resigned from a similar position at the University of Western Ontario, London, 
Canada. 

Dr. Max Astrachan has resigned his position as Head of the Department of 
Accounting and Statistics, School of Business, Air Force Institute of Technology, 
to join the Logistics Department of the RAND Corporation in Santa Monica, 
California. 

Richard E. Barlow received the Ph.D. degree from Stanford University in 
Mathematical Statistics and has accepted a position with the Communications 
Research Division of the Institute for Defense Analyses at Princeton University. 

Dr. Gerald W. Barnes has accepted a position as Associate Professor of 
Psychology at the University of Arkansas, effective September 1, 1960. 

Dr. Robert J. Buehler, Statistical Laboratory, lowa State University, has been 
appointed as Visiting Lecturer in the Department of Statistics at the University 
of Minnesota for part of the summer term. He has also been promoted from 
the rank of Assistant Professor to Associate Professor. 

Donald L. Burkholder of the Department of Mathematics, University of 
Illinois, has been promoted to Associate Professor. 

Dr. Foster B. Cady of North Carolina State College has been appointed as 
Assistant Professor in Statistics at lowa State University beginning July 1, 1960. 

Richard L. Carter has been appointed Professor of Management Engineering 
at Rensselaer Polytechnic Institute. Dr. Carter was formerly Associate Professor 
of Industrial Engineering at the Illinois Institute of Technology. 

Dr. I. M. Chakravarti, on leave of absence from the Indian Statistical Institute, 
who has been currently a Visiting Assistant Professor at the Department of 
Statistics, University of North Carolina, has aow accepted the position of a 
Visiting Professor of Statistics at the Department of Mathematics, Case Insti- 
tute of Technology, for one year commencing September, 1960. 

Jack Chassan has accepted the position of Mathematical Statistician in the 
office of Education of the Department of Health, Education and Welfare. He 
has also recently been elected as a fellow of the American Association for the 
Advancement of Science. 

S. D. Chatterji was awarded the Ph.D. degree in Statistics this June by Michi- 
gan State University and has now accepted a position of lecturership in Mathe- 
matics at the University of N.S. Wales for a period of three years, beginning 
August, 1960. 

Clyde H. Coombs, Professor of Psychology at the University of Michigan, is 

1219 





1220 NEWS AND NOTICES 


on leave this year at the Center for Advanced Study in the Behavioral Sciences. 

Richard G. Cornell is now an Associate Professor in the Department of Statis- 
tics of the Florida State University, Tallahassee, Florida. He was formerly 
Chief of the Laboratory and Field Station Statistics Unit, Communicable 
Disease Center, Atlanta, Georgia. 

John W. Cotton, formerly of Northwestern University, has been appointed 
Associate Professor of Psychology at the University of California, Santa Barbara. 

Alfred E. Croftin, Jr., formerly an instructor at S.M.U., has accepted an 
Assistant Professorship at Pan American College in Edinburg, Texas. 

Professor Herbert David, the Department of Statistics, lowa State University, 
has been promoted from the rank of Assistant Professor to Associate Professor. 

Dr. Reed Dawson has left government service to join American Systems 
Incorporated, a new electronic engineering company in Inglewood, California. 

Satya Deva Dubey received the Ph.D. degree in Statistics from Michigan 
State University with the dissertation Contributions to Statistical Theory of Life 
Testing and Reliability, and is now a Senior Mathematical Statistician in the 
Department of Mathematics and Statistics at The Procter and Gamble Com- 
pany, Cincinnati, Ohio. 

Kenneth Ferrin recently completed the requirements for the Ph.D. degree in 
Mathematical Statistics at U.C.L.A. and has assumed a position as an Associ- 
ate Mathematician at the IBM Research Center, Yorktown Heights, New York. 

Raymond I. Fields received the degree of Doctor of Philosophy, with a major 
in Statistics, from the Virginia Polytechnic Institute on June 12, 1960. He has 
been director of the Computing Laboratory, University of Louisville since 
January 1959. 

Dr. M. Fisz will be a visiting professor at the University of Washington during 
the period September 1960 to June 1961. 

Mr. Fred Frishman, an employee of the U.S. Navy for more than 10 years 
and head of the Mathematics Division of the Naval Propellant Plant since 1954, 
recently accepted an appointment as a mathematician with the Army Research 
Office, Office of the Chief of Research and Development, Department of the 
Army. 

William R. Gaffey, formerly with the Division of Biostatistics of the Univer- 
sity of California School of Public Health, has accepted a position as Statistical 
Consultant with the Carifornia State Department of Public Health, Berkeley 4, 
California. 

D. W. Gaylor received a Ph.D. in Experimental Statistics at North Carolina 
State College and has accepted a position with the Vallecitos Atomic Laboratory, 
General Electric Co., Pleasaton, California. 

Leon Gilford has accepted a position as Mathematical Statistician with Opera- 
tions Research Incorporated, Silver Spring, Maryland. He was formerly Chief of 
the Operations Research Branch, Statistical Research Division, U.S. Bureau of 
the Census. 

Dr. John Gurland of the Department of Statistics, lowa State University has 





NEWS AND NOTICES 1221 


been appointed as Visiting Professor at the U.S. Army Mathematics Research 
Center, University of Wisconsin for the period July 1, 1960—June 30, 1961. 

Preston C. Hammer of the Mathematics Department of the University of 
Wisconsin has returned from a research leave during the academic year 1959- 
1960 in Zurich, Switzerland. 

Dr. H. O. Hartley, Department of Statistics, lowa State University is spending 
the summer in England where he and Dr. E. 8. Pearson are at work on the 
second volume of Biometrika Tables. 

Bruce M. Hill has been appointed lecturer in the Department of Public 
Health Statistics of the University of Michigan, Ann Arbor, Michigan. 

Terry A. Jeeves has been promoted to the position of Fellow Mathematician 
in the Mathematics Department of the Westinghouse Research Laboratories in 
Pittsburgh. He is presently concerned with consulting on statistical problems and 
doing research on the logical design of computers. 

Dr. Cecil L. Kaller has accepted a position as assistant professor of Mathe- 
matics at the University of Saskatchewan for the fall term. 

Professor Leo Katz has returned to his regular position at Michigan State 
University where he is Head of the Department of Statistics. He has spent the 
past academic year in the London Branch Office of ONR as Scientific Liaison 
Officer, visiting universities and research centers in Western Europe and other 
places where significant work in mathematical statistics was being performed. 

Professor Oscar Kempthorne, Department of Statistics, Iowa State Univer- 
sity, has been granted the research degree, Doctor of Science, ScD., by Cambridge 
University England on the recommendation of the faculty of mathematics and 
Board of Research Studies. 

Susumu Kikuchi, formerly a lecturer at the Osaka City University, Japan, 
received his D. Eng. in Administrative Engineering from Keio University in 
March, 1960. He has joined the School of Engineering of the Okayama Univer- 
sity, Japan, as Associate Professor of Mechanical Engineering. 

Dr. Samuel J. Kilpatrick of Ireland is presently a Postdoctoral Fellow at the 
Statistical Laboratory, Iowa State University. 

Dr. Allyn W. Kimball, Jr. has resigned his position as Chief of the Statistics 
Section of the Mathematics Panel at Oak Ridge National Laboratory and has 
been appointed Professor and Chairman of the Department of Biostatistics in 
the School of Hygiene and Public Health at Johns Hopkins University. Dr. Kim- 
ball has also been appointed Professor of Biomathematics in the School of 
Medicine 

Harold J. Larson has resigned his position as instructor of Iowa State Uni- 
versity and is now working as a Mathematical Statistician with Stanford Re- 
search Institute, Fort Ord, California. 

James A. Lechner, having been relieved of active duty in the Army, has 
joined the staff of the Mathematics Department of the Westinghouse Research 
Laboratories in Pittsburgh, Pennsylvania. He will be concerned with studies of 
the design of experiments and probability applications. 





1222 NEWS AND NOTICES 


Mrs. Leone Y. Low, formerly of Oklahoma State University, has accepted a 
temporary position as instructor in the Mathematics Department, University 
of Illinois. 

John W. Mayne, formerly Chief, Operations Research Branch, Air Defense 
Division at SHAPE, has recently been posted to SHAPE Air Defense Technical 
Centre, The Hague, Netherlands as Deputy Group Leader of the Systems 
Evaluation Group. He continues to be a member of Canada’s Defense Research 
Board on special duty in Europe. 

Gerard T. McLoughlin has recently joined the Mathematics Department at 
Baird-Atomic. 

Mr. Clark T. Miller has accepted a position on the technical staff of the 
MITRE Corporation, Bedford, Massachusetts. 

Donald F. Mills (Ph.D., University of Washington, 1957) has joined the staff 
of the V.A. Hospital, Knoxville, lowa as a Counseling Psychologist. 

Dr. Joseph Moder, Georgia School of Technology, is presently a Postdoctoral 
Fellow at the Statistical Laboratory, lowa State University. 

Roger H. Moore will begin work toward a Ph.D. degree in the Fall of 1960 at 
the Oklahoma State University. He will be studying under the Advanced Study 
Program of the Los Alamos Scientific Laboratory. 

Dr. Wiktor Oktaba of Poland is presently a Postdoctoral Fellow at the Sta- 
tistical Laboratory, lowa State University. 

Stephen Peters has established an independent consulting actuary office at 64 
Mt. Vernon Street, Cambridge, Mass. 

William E. Pruitt received his Ph.D. from Stanford University in June, 1960. 
He is now Assistant Professor of Mathematics at the University of Minnesota 

Ronald Pyke, formerly of Columbia University, is presently Assistant Pro- 
fessor in the Department of Mathematics, University of Washington. 

Dana Quade received his Ph.D. in Statistics from the University of North 
Carolina in June 1960, and is now in the Public Health Service stationed at the 
Communicable Disease Center, Atlanta, Georgia. 

Charles P. Quesenberry has joined the staff of the Mathematics Department, 
Montana State College, Bozeman, where he will be teaching and doing research. 

Enders A. Robinson of the Mathematics Department of the University of 
Wisconsin will spend the academic year 1960-61 at Statistiska Institutionen vid 
Uppsala Universitet, Jarnbrogatan 18, Uppsala, Sweden. 

Jack Sawyer has been appointed assistant professor in the Departments of 
Psychology and Sociology at the University of Chicago, where he was previously 
postdoctoral fellow and lecturer. 

Martin Schatzoff has been awarded a one year scholarship by IBM and will 
be attending Harvard University, Graduate School of Arts and Sciences, De- 
partment of Statistics. 


Professor Seymoor Sherman of the Moore School of Electrical Engineering, 





NEWS AND NOTICES 1223 
University of Pennsylvania, has resigned and accepted a professorship in the 
Department of Mathematics at Wayne State University. 

Dr. Ervin P. Smith, Associate Professor at Montana State College, has been 
appointed as Visiting Associate Professor in Statistics at lowa State University 
for the academic year 1959-1960. 

Dr. John H. Smith, Professor of Statistics, is on sabbatical leave from the 
American University to study mathematics at the University of Chicago. He will 
return to the American University in September, 1961. 

Dr. Milton Sobel of the Bell Laboratories and New York University has 
joined the staff of the Statistics department at the University of Minnesota as 
an associate professor. 

Dr. Melvin D. Springer has resigned from the staff of Technical Operations, 
Inc., Fort Monroe, Virginia to accept a position as Systems Operations Analyst 
with the Defense Systems Division of General Motors Corporation, Warren, 
Michigan. 

Robert G. D. Steel was appointed as Professor of Experimental Statistics, 
North Carolina State College, September 1, 1960. Dr. Steel has been on the staff 
of Cornell University for several years. At Raleigh he will be engaged in teaching, 
research and consulting activities of the Department. 

David 8. Stoller, Logistics Department, The RAND Corporation, received a 
part-time appointment to the faculty of the Graduate School of Business Admin- 
istration, University of California, Los Angeles, as lecturer (Associate Professor) 
for the academic year, 1960-61. He is the instructor in the “Seminar in Operations 
Analysis’’ for the fall semester. 

Dr. L. Takacs has been promoted to the position of Associate Professor at 
Columbia University. 

Howard G. Tucker has concluded a year’s leave of absence at the Statistical 
Laboratory at the Berkeley Campus and has resumed his teaching position in 
the Department of Mathematics at the University of California, Riverside. 

Dr. David van Tijn has been appointed Director of Research of Applied 
Science, Incorporated. Dr. van Tijn was formerly Senior Staff Scientist. 

G. A. Watterson, formerly a research scholar at the Australian National 
University, has taken up an appointment as Associate Professor of Statistics 
at the Virginia Polytechnic Institute, Blacksburg, Virginia. 

Oscar Wesler has been promoted to Associate Professor of Mathematics at 
the University of Michigan, and spent the summer of 1960 as Visiting Associate 
Professor at Statistics at Stanford University. 

Frank Wilcoxon, formerly in charge of the Statistical Laboratory, Lederle 
Labs. division of American Cyanamid Co. at Pearl River, N. Y., has accepted a 
position as Professor of Statistics at Florida State University Talahassee, Florida. 

Dr. W. H. Williams has resigned from his position in the Department of 
Mathematics, McMaster University.to join the Mathematics Research Group 

. at the Bell Telephone Laboratories, Murray Hill, New Jersey 





1224 NEWS AND NOTICES 


Professor Leroy Wolins, Department of Statistics, Iowa State University, has 
been promoted from the rank of Assistant Professor to Associate Professor. 

William Wolman received the Ph.D. degree in mathematics and statistics at 
the University of Rochester in June 1960. He has transferred from the Navy 
Department and has joined the newly organized Office of Reliability and Systems 
Analysis of the National Aeronautics and Space Administration, as chief stat- 
istician. 


a 


NEW MEMBERS 


The following persons have been elected to membership in the Institute 


Barndorff-Nielsen, Ole Eiler, Mag. Scient. (Arhus Universitet); Assistant Professor, 
Matematisk Institut, Arhus Universitet; Tagtvez 30, Hgzbzerg, Denmark. 

Brgms, Hans K. Mag. Scient. (University of Copenhagen); Assistant Professor, Mathe 
matical Institute, University of Copenhagen; Petersborgvej 2, Copenhagen 9, Denmark. 

Cho, Harry H., B.S. (M. I. T.); Statistician-Operations Research Analyst, Laboratory for 
Electronics, Inc.; 1079 Commonwealth Ave., Boston 15, Massachusetts. 

Choi, Keewhan, M. A. in Statistics (Harvard University); 202 B Holden Green, Cambridge 
88, Massachusetts. 

Cohen, Leo J., A. B. Mathematics (Lehigh University); Head, Mathematical Analysis 
Group, Burroughs Research, Paoli, Pennsylvania; 6228 N. Third Street, Philadelphia, 
Pennsylvania. 

Farquhar, Rex D., B. A. (Illinois College); Quality Control Engineer, Aerojet-General 
Corporation; 1064 E. Granada Ct., Ontario, California. 

Fink, Donald A., M. A. (Yale University); Research Associate, Graduate School of In- 
dustrial Administration, Carnegie Institute of Technology, Schenley Park, Pittsburgh 13, 
Pennsylvania. 

Ginsburg, Seymour, Ph.D. Math. (University of Michigan); Senior Mathematician, System 
Development Corporation, Santa Monica, California; 2417 W. 165th Street, Gardena, 
California. 

Gould, Henry W., M. A. (University of Virginia); Instructor in Mathematics, West Virginia 
University, Department of Mathematics, Morgantown, West Virginia. 

Greenberg, Leonard, M. A. (Columbia University); Development Engineer, Burroughs 
Research Center, Paoli, Pennsylvania. 

Halbrecht, Herbert Z., M. B. A. (University of Chicago), President, Herbert Halbrecht 
Associates, Incorporated, 382 S. Michigan Avenue, Chicago, 4, Illinois. 

Heck, David L., M.A. (University of North Carolina); Associate Mathematician, Inter- 
national Business Machines Corporation, Development Laboratory, Scientific Computation 
Lab, Dept. 284, Endicott, New York. 

Hill, Franklin, A., B.S. (Illinois Institute of Technology); Quality Control Specialist, 
Air Material Command, U. 8. Air Force; 8029 Prairie Ave., Chicago 19, Illinois. 

Hiller, Norman H., B.S. (Lafayette College); Quality Control Engineer, Olin Mathieson 
Chemical Corporation; 8 Wood Ave., Milford, Connecticut. 

Ifram, Adnan Farhan, B.Sc. (American University of Beirut, Beirut, Lebanon); Assistant, 
Mathematics Department, University of Illinois, Urbana, Illinois. 

Jowett, Geoffrey Harcourt, Ph.D. (Sheffield); Associate Director, Technology Laboratory, 
University of Melbourne, Parkville N3, Victoria Australia. 

Khlentzos, Michael Theodore, M.D. (St Louis University School of Medicine); Director 
McAuley Clinic, St. Mary’s Hospital, 2200 Hayes Street, San Francisco 17, California; 
11 Middlefield Drive, San Francisco 27, Calif. 





NEWS AND NOTICES 1225 


Kirk, Jerome R., Student, Reed College, Portland 2, Oregon; 26 Baker Drive, Urbana, 
Illinois. 

Lange, William E., B.S. (Wyoming University); Research Analyst and Account Manager, 
Coleman Investments, 1005 Union Center, Wichita, Kansas. 

Marcus, Leslie F., M.A. (University of California); Student Mathematical Statistician, 
Forest Service, U. 8. Dept. of Agriculture, Statistical Laboratory, University of 
California, Berkeley, California. 

Nagai, Takeaki, M.Sc. (Faculty of Science, Kyushu University); Assistant, Mathematica 
1 Institute, Faculty of Science, Kyushu University, Fukuoka City, Fukuoka Prefecture, 
Japan. 

Neveu, Jacques J. P., Doctor of Sciences (University of Paris); Maitre de Conferences, 
University of Paris, 11 Rue Pierre Curie, Paris &, France. 

Nomachi, Yukio, Graduated of Mathematics Department of Science (Kyushu University); 
Assistant Professor, Shinionoseki City College of Commerce, Kifune-cho, Shiminoseki 
City, Japan. 

Sabto-Agami, Jacob C., E.E. (Columbia University); Development Engineer, ITT Lab- 
oratories, 390 Washington Avenue, Nutley—10, New Jersey, 340 Haven Avenue, New 
York 338, New York. 

Stephanides, Agatha S., B.A., (George Washington University ); Mathematical Statistician, 
Navy Dept., Bureau of Yards and Docks; 9920 Georgia Ave., Silver Springs, Maryland. 

Wann, Marie D., Ph.D. (Columbia University); Mathematical Statistician, Census Bureau, 
Washington 25, D.C.; 6704 21st Place, Washington 21, D. C. 

Yu, Ruth Li, M.S. Business Statistics (University of Missouri); Senior Programmer, Rem- 
ington Rand Univac, 19th and Allegheny Ave., Philadelphia 29, Pennsylvania; School 
Lane House, Apt. 1109, 5450 Wissahickon Ave., Philadelphia 44, Pennsylvania. 

Yu, Tom Teng-Pin, Ph.D. (New York University); Manager of Research Department, 
Remington-Rand Univac, 19th and Allegheny Ave., Philadelphia 44, Penn.; School 
Lane House, Apt. 1109, 54560 Wissahickon Ave., Philadelphia 44, Penn. 


en 


A STUDY OF THE DESIGN OF FACILITIES FOR MATHEMATICS 


As its first project, the Conference Board of the Mathematical Sciences will 
conduct a study of the design of building and facilities for mathematics; Educa- 
tional Facilities Laboratories, established by the Ford Foundation in New York 
City, has agreed to provide the necessary funds. 

There are many reasons why it is desirable to make a study of the design of 
facilities for the mathematical sciences at the present time. In the first place, 
mathematics has been very poorly housed in the past. In the second place, 
enrollments are now expanding rapidly. Many colleges and universities have 
five times as many majors in mathematics as they had only four or five years 
ago, and the great increases in enrollments for the nation as a whole are still 
to come. 

In the third place, a study of the design of mathematics facilities is appropriate 
because of the many changes that have taken place in the mathematical sciences. 
The project will undertake a study of the design of facilities to support the total 
activities of the mathematical sciences. These activities include research and 
instruction in pure mathematics, applied mathematics, and statistics; prepara- 
tion of the manuscripts of research papers; preparation of the manuscripts of 





1226 NEWS AND NOTICES 


textbooks and expository manuscripts for instructional purposes; teacher 
training; instruction in the operation of desk calculators and electronic digital 
computers; and the operation of summer and academic year institutes. Modern 
facilities for the mathematical sciences must provide headquarters space; class- 
rooms; seminar rooms; offices for the staff; library space; a statistics laboratory 
with desk calculators; a computation center for the electronic digital computer; 
facilities for the use of films, television, and other teaching aids; and a common 
room. 

The construction of appropriately designed facilities for mathematics is im- 
portant for a special reason at this time. There is a great shortage of mathematics 
teachers, and it is probable that this shortage will continue for many years. 
Under these conditions it is imperative that we make the teachers we do have 
more efficient than they have been in the past. Some universities are teaching 
elementary courses in sections of one to two hundred students; others would 
like to do so, but lecture rooms for classes of this size are not available. In many 
cases several staff members—even senior staff members—are crowded into one 
office. When one has a visitor, the others stop work. There are important uni- 
versities that have never been able to provide one chair and desk per staff 
member. Better classrooms, better offices, and better facilities of all kinds will 
certainly make our mathematics staffs more efficient—will enable our mathe- 
matics teachers to teach more students and to teach them better. 


rr 


FELLOWSHIP AND RESEARCH OPPORTUNITIES 


The Division of Mathematics, National Academy of Sciences—National 
Research Council calls attention to a variety of fellowships and other support 
for basic research in mathematics to be awarded by agencies of the Federal 
Government during the year 1960-61. A list of sources of support is given in the 
bulletin, ‘A Selected List of Major Fellowship Opportunities and Publications 
for Educational Support,” available from the Fellowship Office, National Acad- 
emy of Sciences-National Research Council, 2101 Constitution Avenue, Wash- 
ington 25, D.C. 


a 


THE CONFERENCE BOARD OF THE MATHEMATICAL SCIENCES 


The Conference Board of the Mathematical Sciences had its origin in the War 
Policy Committee,' which was appointed by the American Mathematical 
Society and the Mathematical Association of America at the end of 1942 to deal 


' The history of the War Policy Committee is sketched in the following references 
American Mathematical Monthly, vol. 50 (1943), pp. 138, 205, 466, 593; vol. 51 (1944), pp. 
112-115, 549; vol. 52 (1945), p. 115. Bulletin of the American Mathematical Society, vol. 49 
(1943), p. 199. 





NEWS ANL NOTICES 1227 


with some of their common problems which arose out of World War II. The War 
Policy Committee was supported by a grant from the Rockefeller Foundation. 
The war over, the War Policy Committee was discharged in November, 1945. 
The American Mathematical Society immediately took the lead in the formation 
of the Mathematical Policy Committee, usually known simply as the Policy 
Committee.? This Committee grew and eventually had six of the mathematical 
organizations as its members. 

In 1958 the Policy Committee was developed into the Conference Organiza- 
tion of the Mathematical Sciences, and a constitution and by-laws were drawn 
up. In December, 1958, the Mathematical Association of America received a 
grant from the Carnegie Corporation of New York for the establishment of a 
Washington Office. At its Salt Lake City meeting in 1959 the Association recom- 
mended that the Washington office be established by the Conference Organiza- 
tion with the Carnegie grant. The Conference Organization accepted the re- 
sponsibility for establishing the Washington Office with the Carnegie grant, 
and on February 25, 1960, the Conference Organization was incorporated in the 
District of Columbia with the new name Conference Board of the Mathematical 
Sciences. G. Baley Price, who had been appointed the first Executive Secretary, 
opened the Washington office on July 1, 1960. 

The Conference Board of the Mathematical Sciences has six member organi- 
zations and no individual members. The six member organizations are the 
American Mathematical Society, the Association for Symbolic Logic, the Insti- 
tute of Mathematical Statistics, the Mathematical Association of America, the 
National Council of Teachers of Mathematics, and the Society for Industrial 
and Applied Mathematics. 

The Washington Office will not be involved in any way in the operation of the 
activities of the member organizations, and it is not expected that any of the 
activities of the member organizations will be transferred to the Washington 
Office or to the Conference Board. 

The Washington Office will gather information about events and developments, 
especially those in Washington, which concern mathematics, and will relay this 
information to those member organizations, mathematicians, and others who 
may wish to receive it. 

The Washington Office will supply information and help on mathematical 
matters as the opportunity arises. Furthermore, in carrying out this function, 
the Washington Office will arrange for assistance from the member organ- 
izations and individual mathematicians as the situation demands dnd op- 
portunity permits. Member organizations may request the assistance of the 
Washington Office. 

From time to time the Washington Office will manage or operate special 
projects which are compatible with the purposes and functions of the Conference 


? The first reference to the formation of the Policy Committee occurs in the report of 


“The Twenty-Ninth Annual Meeting of the Association’’, American Mathematical Monthly, 
vol. 53 (1946), p. 178 





1228 NEWS AND NOTICES 


Board and which will serve the common interests of the member organizations. 
These projects may be supported by contracts or grants from foundations, govern- 
ment agencies, or other organizations. 

The management of the Conference Board is vested in its Council, which 
consists of two representatives from each of the member organizations and of 
six representatives at large. The officers of the Conference Board are the fol- 
lowing: Chairman, 8. 8. Wilks; Secretary, J. R. Mayor; Treasurer, A. E. Meder, 
Jr.; Executive Secretary, G. B. Price. 


Eos 


CONFERENCE BOARD SPONSORS CONTEMPORARY MATHEMATICS 
ON CONTINENTAL CLASSROOM 


The Conference Board of the Mathematical Sciences is the mathematics 
sponsor of Contemporary Mathematics, the new Continental Classroom course 
for 1960-1961. The other sponsors are Learning Resources Institute and the 
National Broadcasting Compauy. The course begins on September 26 and runs 
for 32 weeks, and it will be presented from 6:30 to 7:00 a.m., Monday through 
Friday, in each time zone. The first semester of Contemporary Mathematics 
will be devoted to Modern Algebra, by Professor John L. Kelley with the as- 
sistance of Dr. Julius Hiavaty; the second semester will be devoted to Probabil- 
ity and Statistics, by Professor Frederick Mosteller with the assistance of Pro- 
fessor Paul C. Clifford. 

The Conference Board appointed the following advisory committee, which 
has assisted Learning Resources Institute in planning Contemporary Mathe- 
matics: E. G. Begle (Chairman), L. W. Cohen, R. P. Dilworth, P. 8S. Jones, 
J. R. Mayor, E. J. McShane, A. E. Meder, Jr., and Mina 8. Rees. 


i 


REGIONAL ORIENTATION CONFERENCES IN MATHEMATICS 


The National Council of Teachers of Mathematics, with financial support 
from the National Science Foundation, will hold a series of eight Regional 
Orientation Conferences in Mathematics during October, November, and Decem- 
ber of 1960. The purpose of the conferences is to bring to school administrators a 
comprehensive account of the new programs in secondary school mathematics 
that have been, and are being, developed on a national scale, and to give practical 
advice about how to introduce one of the new programs in a high school. At- 
tendance will be by invitation. The project is under the direction of Mr. Frank 
B. Allen of LaGrange, Illinois. 

The following program will be presented at each of the eight conferences: 
Address: “Progress in Mathematics and Its Implications for the Secondary 

School” G. Baley Price, Professor of Mathematics, The University of 





NEWS AND NOTICES 1229 


Kansas, and Executive Secretary of the Conference Board of the Mathe- 
matical Sciences. 

Address: ‘“The Drive to Improve School Mathematics—Comparisons and Com- 
mon Elements of Special Programs” Dr. Kenneth Brown, Specialist in 
Mathematics, United States Office of Education 

Panel Discussion: “Our Experience with the New Programs in Mathematics” (A 
local panel for each conference will be announced) 

Address: “Implementing the New Mathematics Program in Your School” Dr. 
W. Eugene Ferguson, Head, Department of Mathematics, Newton High 
School, Newtonville, Massachusetts. 


a rr = 


INTERNATIONAL SYMPOSIUM ON THE TRANSMISSION AND 
PROCESSING OF INFORMATION 


The Professional Group on Information Theory of the Institute of Radio 
Engineers, in cooperation with the Center of Communication Sciences, Research 
Laboratory of Electronics, Massachusetts Institute of Technology, is planning 
to hold an International Symposium on the Transmission and Processing of 
Information on September 6-8, 1961. This Symposium will be held at the 
Massachusetts Institute ef Technology, Cambridge, Massachusetts. 

The purpose of the Symposium will be to provide an outstanding occasion for 
the presentation of significant new research contributions, of either a theoretical 
or experimental nature. As in the case of the similar 1954 and 1956 symposia, no 
tutorial papers will appear; the program will be planned specifically for active 
specialists in the field. In order to provide opportunity for creative and thorough 
discussion, the Symposium Transactions will be distributed at least two weeks 
prior to the meetings. 

Submission of papers is hereby invited. In order to carry out the publication 
plan successfully, the following deadline schedule is necessary. Receipt of 500- 
1000 word Abstracts: 1 January 1961 Receipt of full-length Papers: 1 April 
1961. 

Authors will be notified of the preliminary acceptance of their Abstracts by 20 
January; the final program selection will be made on the basis of the complete 
Papers, and authors notified by 1 May. Abstracts and Papers should be submitted 
to the Chairman of the Organizing Committee, R. M. Fano, R.L.E., M.L.T., 
Cambridge 39, Mass. 

Additional information about the Symposium will be disseminated as plans 
develop. { 


SEnEIEnEEREREI nme eR 


NEW COURSES OFFERED 


New courses in ‘‘Advanced Statistical Methods,” “Sampling Theory and Prac- 
tice” and “Linear Programming in Business and Industry,” designed to provide 





1230 NEWS AND NOTICES 


a comprehensive presentation of recent developments in statistical theory, 
applications, and procedures for graduate students and statistical technicians, 
will be offered for the first time this fall by City College’s Baruch School, 17 
Lexington Avenue. 

Course descriptions and registration information may be obtained by writing 
to the Graduate Office, Room 1605, City College-Baruch School, 17 Lexington 
Avenue, New York 10, N.Y. 


Sanne 


AMERICAN MATHEMATICAL SOCIETY SYMPOSIUM ON 
MATHEMATICAL PROBLEMS IN THE BIOLOGICAL 
SCIENCES 


A Symposium on Mathematical Problems in the Biological Sciences, co-spon- 
sored by the Office of Ordnance Research and the National Science Foundation, 
will be held in connection with the April Meeting of the American Mathematical 
Society at the Hotel New Yorker in New York City, on April 6, 7, and 8. 

The objective of the Symposium will be to inform and interest mathematicians 
in the problems of biology and medicine, and to stimulate investigation of topics 
in both pure and applied domains. Some of the topics of the symposium are: Self- 
reproduction problems: How can machines be designed that are capable of repro- 
ducing themselves? Reliability problems: How design reliable machines (or organ- 
isms) using unreliable components? Operation of the nervous system: Analysis of 
the actual brain; Synthesis of computers to stimulate the brain: and M athemati- 
cal problems of growth and form; Hydrodynamical problems of circulation, hormones, 
enzymes, etc.; Information theory, mathematical logic, and combinatories arising 
in: Problems in genetics, Statistical theory, Classical theory, and Genetic ‘code’. 

The selection of speakers for the Symposium has been delegated to an Invita- 
tions and Steering Committee consisting of Dr. S. M. Ulam, (Los Almos Scientific 
Laboratory), Chairman; Dr. Richard E. Bellman, (Rand Corporation), Secre- 
tary; Dr. John Jacquez, (Sloan Kettering Institute for Cancer Research); Pro- 
fessor Claud E Shannon, (Massachusetts Institute cf Technology); Professor 
Anthony Bartholomay, (Biophysics Research Laboratory, Peter Bent Brigham 
Hospital, Harvard Medical School) 


a eR 


IMS OFFICERS, COMMITTEES, AND REPRESENTATIVES 


This is as complete a listing for 1959 and 1960 as it was possible to obtain. The 
1959 listing is an expanded version of the list on pages 552-553, Annals of Mathe- 
matical Statistics, Volume 31, June, 1960. Ed. 


Council Members and Officers 


Terms Expire 1960 Terms Expire 1962 
David Blackwell T. W. Anderson 





NEWS AND NOTICES 


Harold Hotelling 
Jerzy Neyman 
I. R. Savage 
Terms Expire 1961 
F. J. Anscombe 
T. E. Harris 
Leo Katz 
8. 8. Wilks 


1959 
President: J. Wolfowitz 
President-Elect: J. W. Tukey 
Sectary: G. E. Nicholson, Jr. 
Treasurer: A. H. Bowker 
Editor: W. H. Kruskal 
Program Coordinator: M. B. Wilk 
Associate Secretaries: 
Central: J. Silber 
Eastern: J. Rosenblatt 
Western: G. J. Lieberman 
1959 Fellows 
G. B. Dantzig, M. Fisz, G. A. Barnard, 
G. J. Lieberman, I. Olkin, R. Sitgreaves 


J. L. Hodges, Jr. 
Z. W. Birnbaum 
W. Hoeffding 
Terms Expire 1968 
H. Chernoff 
K. L. Chung 
M. G. Kendall 
C. Stein 


1960 
President: J. W. Tukey 
President-Elect: E. L. Lehmann 
Secretary: G. E. Nicholson, Jr. 
Treasurer: A. H. Bowker 
Editor: W. H. Kruskal 
Program Coordinator: D. M. Gilford 
Associate Secretaries: 
Central: J. Silber 
Eastern: J. Rosenblatt 
Western: G. J. Lieberman 
1960 Fellows 
To Be Announced 


COMMITTEES 
(The first person named is the chairman) 
1959 1960 
COMMITTEE ON ANNALS INDEX 


I. R. Savage, T. E. Harris, J. L. Hodges, Continued from 1959 
Jr., W. H. Kruskal, G. E. Nicholson, Jr. ‘ 


25th ANNIVERSARY COMMITTEE 


B. Harshbarger, W. G. Cochran, C. C. Continued from 1959 
Craig, E. G. Olds, J. W. Tukey, 8. 8. 
Wilks, J. Wolfowitz 


BROCHURE COMMITTEE 


E. Parzen, D. Blackwell, 4. H. Bowker, 
J. F. Daly, B. G. Greenberg, M. H. 
Hansen, G. E. Nicholson, Jr., 8. 8. Wilks 


COMMITTEE ON EXCHANGES 


P.S. Dwyer, A. H. Bowker, W. H. Kruskal, Continued from 1959 
G. E. Nicholson, Jr. 


COMMITTEE ON FELLOWS 


Z. W. Birnbaum, W. Hoeffding, E. 8. Pear- W. Hoeffding, W. G. Cochran, J. L. Hodges, 
son, E. J. G. Pitman, E. L. Seott, H. Jr., E. 8. Pearson, E. J. G. Pitman, 
Solomon H. Solomon 

COMMITTEE ON FINANCE 


H. Levene, G. Noether, I. Olkin. A. H. Continued from 1959 
Bowker (ex officio 





NEWS AND NOTICES 


1959 


1960 


COMMITTEE ON INSTITUTIONAL MEMBERS 


M. E. Muller, Frank Akutowiez, K. J. 
Arnold, Z. W. Birnbaum, R. B. Murphy, 
8. 8. Wilks 


COMMITTEE INVESTIGATING THE 


M. E. Muller, K. J. Arnold, E. L. Crow, 


M. Dwass, E. Lukacs, P. D. Minton, R. B. 
Murphy, E. Seiden 


POSSIBILITY OF BILLING FOR 


PUBLICATION IN THE ANNALS 
A. M. Mood, T. W. Anderson, J. H. Curtiss 


COMMITTEE ON MATHEMATICAL TABLES 


D. B. Owen, G. P. Steck, R. L. Anderson, 
A. H. Bowker, P. C. Cox, E. E. Cureton, 
W. J. Dixon, C. Eisenhart, J. A. Green- 
wood, 8. 8. Gupta, H. L. Harter, H. O. 
Hartley, L. Katz, W. H. Kruskal (ex 
officio), F. C. Leone, M. E. Muller, P. 8. 
Olmstead, J. W. Tukey, M. A. Wood- 
bury, M. Zelen 


D. B. Owen, G. P. Steck, R. L. Anderson, 
A. H. Bowker, P. C. Cox, E. E. Cureton, 
W. J. Dixon, C. Eisenhart, J. A. Green- 
wood, 8. 8. Gupta, H. L. Harter, H. O. 
Hartley, L. Katz, W. H. Kruskal (ex 
officio), F. C. Leone, M. E. Muller, P. 8. 
Olmstead, M. A. Woodbury, M. Zelen 


SUBCOMMITTEES OF CMT 
Studentized Range 


H. L. Harter, A. H. Bowker, C. Daniel, 
W. T. Federer, H. O. Hartley, G. E. 
Noether 


H. L. Harter, A. H. Bowker, C. 


Daniel, 
W. T. Federer, H. O. Hartley, G. E. 
Noether 


F and Related Distributions 


P. C. Cox, R. L. Anderson, E. E. Cureton, 
D. Durand, J. A. Greenwood, L. H. 
Herbach, H. O. Hartley, E. S. Keeping, 
C. F. Kossack, G. Kulldorff, H. Solomon, 
D. Teichroew, L. Wine 


P. C. Cox, R. L. Anderson, E. E. Cureton, 


D. Durand, J. A. Greenwood, H. O. 
Hartley, L. H. Herbach, E. 8. Keeping, 
C. F. Kossack, G. Kulldorff, H. Solomon, 
L. Wine 


Hypergeometric Distribution 


L. Katz, B. F. Kimball, G. J. Lieberman, 
M. Sobel 


L. Katz, B. F. Kimball, G. J. Lieberman, 


M. Sobel 


Multivariate Distributions Related to the Normal 


8S. 8S. Gupta, T. W. Anderson, L. H. 
Herbach, E. 8. Keeping, I. Olkin, D. B. 
Owen, H. Ruben, M. Sobel, M. Zelen, 
M. A. Woodbury 


S. 8. Gupta, T. W. Anderson, L. H. 


Herbach, E. 8. Keeping, I. Olkin, D. B. 
Owen, H. Ruben, M. Sobel, M. A. Wood- 
bury, M. Zelen 


Availability of Simple Techniques 


P. 8. Olmstead, R. A. Bradley, E. E. Cure- 
ton, W. J. Dixon, T. A. Lamke, 8. B. 
Littauer 


P. 8S. Olmstead, R. A. Bradley, E. E. Cure- 


ton, W. J. Dixon, T. A. Lamke, 8. B. 
Littauer 





NEWS AND NOTICES 


1959 1960 
Cost-Free Machine Time and Computing Code Index for Statistical Functions 
F. C. Leone, J. W. Hamblen, W. H. Horton, F. C. Leone, J. W. Hamblen, W. H. Horton, 


G. F. Lunger, H. A. Meyer, P. D. Minton, G. F. Lunger, H. A. Meyer, P. D. Minton, 
M. E. Muller, M. A. Woodbury M. E. Muller, M. A. Woodbury 


Republication of Tables 


J. A. Greenwood, D. Durand, E. J. Gilbert, J. A. Greenwood, D. Durand, E. J. Gilbert, 
N. C. Severo N. C. Severo 


COMMITTEE ON MEMBERSHIP 


B. Epstein, H. E. Daniels, M. Dwass, 8. Continued from 1959 
Kullback, 8. Moriguti, G. R. Seth, R. 
Sitgreaves, M. E. Terry 


NOMINATING COMMITTEE 


1960 Election 1961 Election 
8. Karlin, M. Dwass, B. Epstein, G. E. M. B. Wilk, B. Brown, W. G. Cochran, D. M. 
Noether, L. Weiss Gilford, J. L. Hodges, Jr., P. D. Meier, 
E. 8. Pearson, W. L. Smith 


1959 1960 


COMMITTEE ON PROFESSIONAL STANDARDS 


J. Lev, R. W. Burgess, C. Eisenhart, G. M. Continued from 1959 
Harrington, B. Kimball, A. W. Kimball, 
H. Marshall, R. Patton, J. Walsh 


PROGRAM COMMITTEE FOR ANNUAL MEETING 


E. Parzen, R. E. Bechhofer, J. R. Blum, E. W. Barankin, F. C. Andrews, L. Katz, C. 
D. G. Chapman, M. B. Wilk (ex officio), Kossack, P. 8. Olmstead, 8. N. Roy, 
L. LeCam, IL. Olkin, J. Rosenblatt, R. D. M. Gilford (ex officio) 

Sitgreaves, J. L. Snell, D. L. Wallace, 
M. Zelen 


PROGRAM COMMITTEE FOR CENTRAL REGIONAL MEETING 


F. Graybill, I. W. Burr, J. Gurland, H. L. F. J. Anscombe, P. Billingsley, I. Blumen, 
Jones, P. R. Rider, F. C. Leone, B. R. C. Bose, D. M. Gilford (ex officio), 
Rankin, M. B. Wilk (ex officio) H. O. Hartley, E. Lukacs, H. Ruben, 

I. R. Savage, H. Teicher, R. A. Wijsman 


PROGRAM COMMITTEE FOR EASTERN REGIONAL MEETING 


B. Harshbarger, B. G. Greenberg, D. G. R. Sitgreaves, R. E. Bechhofer, C. Y. 
Horvitz, C. Kossack, H. A. Meyer, J. Cramer, D. M. Gilford (ex officio), M 
Pratt, W. H. Horton, J. D. Hromi, M. B. Halperin, W. L. Smith 
Wilk (ex officio) 


PROGRAM COMMITTEE FOR WESTERN REGIONAL MEETING 


F. J. Massey, F. C. Andrews, C. B. Bill, Continued from 1959 
T. Ferguson, J. P. Gilbert, J. F. Hof- 
mann, R. F. Link, M. M. Sandomire, 





1234 NEWS AND NOTICES 


1959 


D. 8. Stoller, R. F. Tate, J. R. Vatnsdal, 
M.B. Wilk (ex officio) 


COMMITTEE TO REEXAMINE IMS FROM THE VIEWPOINT OF 
YOUNGER MEMBERS 


W. L. Smith, P. Billingsley, 8. H. Brooks, 
B. W. Brown, Jr., D. L. Burkholder, 
M. B. Danford, M. H. DeGroot, A. P. 
Dempster, D. A. Gardiner, W. J. Hall, 
J. W. Hamblen, J. E. Jackson, A. 
Madansky, H. E. McKean, R. Pyke, 
D. L. Wallace, O. Wesler, R. A. Wijsman 


IMS REPRESENTATIVE TO AAAS—Harold Hotelling (1959 and 1960) 


AMERICAN STANDARDS ASSOCIATION COMMITTEE ON STATISTICAL 
NOMENCLATURE: IMS REPRESENTATIVE—P. G. Hoel (1959 and 1960) 


IMS REPRESENTATIVE TO CONFERENCE ORGANIZATION OF THE MATHE- 
MATICAL SCIENCES—W. M. Rosenblatt, Z. W. Birnbaum (1959 and 1960) 


IMS REPRESENTATIVE IN DIVISION OF MATHEMATICS—NATIONAL 
RESEARCH COUNCIL 


W. A. Wallis (1959) F. C. Mosteller (from July 1, 1960) 
IMS REPRESENTATIVES TO ORGANIZING COMMITTEE OF 4TH BERKELEY 
SYMPOSIUM—H. Robbins, A. H. Bowker (1959 and 1960) 
1959 1960 


IMS REPRESENTATIVES ON JOINT ASA-IMS BROCHURE COMMITTEE 


8S. 8. Wilks, D. Blackwell, B. G. Green- 
berg, W. H. Kruskal 


IMS REPRESENTATIVE TO AMS-IMS COMMITTEE ON 
RUSSIAN TRANSLATIONS 


E. Lukaes, I. Olkin 


COMMITTEE ON RUSSIAN TRANSLATIONS 


I. Olkin, K. L. Chung, J. L. Doob, E. Continued from 1959 
Lukacs, L. Weiss 


COMMITTEE TO SELECT AUDITOR 
W. R. Gaffey, M. V. Johns, Jr, E. L. 
Kaplan 
COMMITTEE ON SPECIAL PAPERS 


K. L. Chung, T. W. Anderson, H. Chernoff, T. W. Anderson, R. C. Bose, W. G. Coch- 
W. G. Cochran, J. L. Hodges, Jr., W. ran, D. M. Gilford, J. L. Hodges, Jr., 
Hoeffding S. Karlin, W. H. Kruskal, 8. 8. Wilks 





NEWS ANP NOTICES 
1959. 1960 
COMMITTEE ON SUBSCRIPTIONS 


E. P. Coleman, J. K. Adams, C. B. Bell, Continued from 1959 
K. A. Bush, L. R. Elveback, H. Harmon fake 


COMMITTEE TO STUDY THE OFFICES OF SECRETARY AND TREASURER 


W. G. Cochran, A. H. Bowker, D. M. 
Gilford, T. E. Harris, G. E. Nicholson, 
Jr., W. A. Wallis 


ese 
STATEMENT OF COMMITTEE ON EXCHANGES 


During the years of its history, the Institute of Mathematical Statistics has 
entered into exchange arrangements involving the Annals of Mathematical Sta- 
tistics and other journals. This statement is prepared so that the members of the 
Institute may know (a) something of the reasoning which has led to the estab- 
lishing and continuation of these exchanges and (b) the titles of the journals 
with which we currently have exchange arrangements. 

Some exchanges with the Annals were established by Professor Carver in the 
years prior to the founding of the Institute of Mathematical Statistics, which 
took over the Annals after its founding in 1935. The exchanges were continued 
and extended during the first fourteen years of the life of the Institute with re- 
sponsibility for the exchanges largely in the hands of the Secretary-Treasurer. 
Since 1949 there has been a Committee on Exchanges to handle the exchange 
correspondence and to receive and organize the exchange journals. The exchange 
journals, though under the formal jurisdiction of the Secretary, are currently 
located in the Statistical Research Laboratory of The University of Michigan. 

During the period subsequent to 1935, a general policy toward exchanges was 
established which led to the adoption of general principles by the Council on 
December 29, 1949. These principles have not been changed substantially by 
later Councils. In describing these principles, extracts are taken from the Report 
of the Committee on Exchanges which was presented to the Council in August, 
1957. 

“Tt has been the attitude that the chief reason for exchanges is that a Society, 
in a good financial position like ours, can well afford to place a few copies of 
its journal in strategic positions throughout the world. In addition to se- 
curing a wider use of the Annals, which is one of the objectives of the Insti- 
tute, such a practice may be advantageous from the standpoint of selling 
back copies. 


“The emphasis on placing the Annals in different parts of the world, rather 
than in creating a library of exchanges, has determined several aspects of 
policy. 





1236 NEWS AND NOTICES 


“1. There has been no effort in recent years on the part of the Institute to 
initiate exchanges. All exchanges introduced during recent years arise from 
requests to us. Correlated with this policy is the policy of not rushing into 
an exchange and of making sure that the exchange is strongly desired by 
the journal making the request. For example, a printed notice requesting 
an exchange is not generally interpreted by us as a specific request for 
exchange. 

“2. In general the exchange should be with a journal which is established 
with regular dates of appearance. 

“3. Exchanges with journals whose contents are somewhat similar to the 
contents of the Annals are preferred. 

“4. In general it is preferred to establish exchanges with journals of scien- 
tific organizations rather than with governmental publications or with jour- 
nals of universities. 

“5. In general it is desired to have at least one exchange in each of the 
countries of the world so that the statisticians there may have a chance of 
making contact with the Annals. Sometimes this takes precedence over (2), 
(3), or (4). 

“6. As the number of exchanges in a given country increases, the prospect 
for granting a given exchange decreases, even though the proposed exchange 
journal is appropriate. For example, we were more liberal with exchanges 
introduced in Japan directly after the war than we are now.” 

Several valuable sets (though not necessarily complete during the exchange 
period) of statistical and other journals are now the property of the Institute. It 
is possible that the Institute, at some future data, if conditions are suitable, 
may wish to place these issues in a national office and make them available to 
members through a library service. 

A special effort was made, during the period from August, 1958, through 
December, 1959, to correspond with the sponsors of all journals with which we 
had exchanges as well as with those who have recently indicated a desire to 
establish an exchange with us. As a result of this exchange of correspondence, we 
are convinced that the exchanges we have in effect are, for the most part, con- 
sistent with our policy aims. Also we have established thirteen new exchanges in 
China, Czechoslovakia, Peru, Roumania;and Russia. The list of active ex- 
changes, now in effect, follows. 

In view of the objectives outlined above, it seemed wise to list the exchange 
journals by the countries to which the issues of the Annals are sent, even for 
journals which have an international clientele. The problem of determining a 
suitable title for this listing was resolved in most cases by using one of the titles 
which appeared on the cover of the journal. 


IMS EXCHANGES IN EFFECT, 1960 


Argentina Austria 
Mathematicae Notae Monatshefte fiir Mathematik 





NEWS AND NOTICES 


Brazil 
Anuério Estatistico do Brasil 
Revista Brasileira de Estatistica 
Canada 
Canadian Journal of Mathematics 
China 
Acta Mathematica Sinica 
Progress in Mathematics 
Cuba 
Boletin Informativo, Consejo Nacional de 
Economia 
Czechoslovakia 
Czechoslovak Mathematical Journal 
Denmark 
Mathematica Scandinavica 
El Salvador 
Boletin Estadistico 
England 
Annals of Human Genetics 
Biometrika 
Journal of the London Mathematical So- 
ciety 
Journal of the Royal Statistical Society, 
Series A and B 
Proceedings of the Cambridge Philosophical 
Society 
Finland 
Annals of the Finnish Academy of Sci- 
ences, Series A, I. Math.-Phys. 
France 
Annales de l'Institut Fourier 
Journal de la Société de Statistique de Paris 
Germany 
Metrika (formerly Mitteilungsblatt fir 
Mathematische Statistik and Statistische 
Vierteljahresschrift) 
Hungary 
Demogréfia 
Publications of the Mathematical Institute 
of the Hungarian Academy of Sciences 
Publicationes Mathematicae 
India 
Calcutta Statistical Association Bulletin 
Sankhya 
Israel 
Bulletin of the Research Council of Israel, 
Section C, Technology. Section F, 
Math.-Phys. 
Statistical Bulletin of Israel 
Italy 
Metron 
Revista Italiana di Economia Demografia 
e Statistica 


1237 


Japan 
Annals of the Institute of Statistical Mathe- 
matics 
Annals of the Hitotsubashi Academy 
Journal of the Mathematical Society of 
Japan 
Mathematica J aponicae 
Memoirs, Faculty of Science, Kyushu Uni- 
versity, Series A, Math. 
Nagoya Mathematical Journal 
Tohoku Mathematical Journal, 
Series 
Yokohama Mathematical Journal 
Mexico 
Boletin del Centro de Documentacién Cien 
tifica y Técnica de México 
Netherlands 
Indagationes Mathematicae 
Revue de |'Institut International de Statis 
tique 
Pan America 
Ciencia y Tecnologia 
Estadistica 
Peru 
Anuario Bibliogréfico Peruano 
Poland 
Annales Universitatis Mariae Curie-Sklo- 
dowska 
Portugal 
Revista do Centro de Estudos Econémicos 
Roumania 
Revue de Mathématiques Pures et Appli- 
quées 
Studii gi Cercetdri Matematice 
Russia 
Isvestitia Akademii Nauk, Seriia Mate 
maticheskata 
Mathematics ; 
Teorita Veroiatnostel i ee Prilozhenita 
Trudy Moskovskogo Matematicheskogo Ob- 
shchestva 
Scotland 
Proceedings of the Royal Society of Edin- 
burgh 
Spain 
Spanish-American Trade 
Trabajos de Estadistica 
Switzerland 
Revue suisse d'Economie politique et de 
United States 
Annals of Mathematics 
Econometrica 


Second 





1238 NEWS AND NOTICES 


Journal of the American Statistical Asso Yugoslavia 
ciation Glasnik, Series II 
Pacific Journal of Mathematics Indeks 
Uruguay Statistiéki Bilten 
Publicaciones del Instituto de Matematica y Statistiéka Revija 
Estadistica Pau. 8. Dwyer 


QUESTIONNAIRE CONCERNING THE ANNALS 
Jou» W. TuKeEy 
Princeton University 
The questionnaire reproduced as Figure 1 was sent to the approximately 1800 
members of the Institute of Mathematical Statistics with the program of the 1960 
Annual Meeting at Stanford University in Palo Alto. This report describes the 


first 430 replies. (If later replies indicate any substantial change in interpretation, 
a further report will be made.) 


1. Overall response. Replies were made as follows: 


Question Danger of resignation | Affiliation weaker N Affiliation stronger 


| i 77 167 

19 164 ‘ 33 

5 38 297 

33 209 j 28 

14 04 ‘ 108 

5 5 68 , 49 


* Includes ‘‘no change’’ and blanks. 


The general tendencies can be summarized as follows: some preference for more 
probability, stronger preference for more directly applicable mathematical 
statistics, balanced attitude toward tables (with little pro-or-con about less space 
for tables). 


2. Direction and strength of response. The category “danger of resignation”’ 
having no parallel on the strong side, it seems reasonable to compare responses 
to “‘more” and “less” questions on the basis of a trichotomy: L = “danger of 
resignation” or “affiliation weaker’, 0 = ‘‘no change”’ or blank, H = “affiliation 
stronger” and to combine the answers to “more” and “‘less” questions into a 
score as follows: 

“more” questions so n6| UO! 6BBCUOCUlUVS Se es 
“less”? questions a Be no, 2” OO 8 
score +2 +1 +1000 -1 -1 -2 





NEWS AND NOTICES 1239 


duly 19, 1960 


MEMORANDUM 


TO: The Members of The Institute of Mathematical Statistics 
FROM: John W. Tukey, President 


A new, separate journal dealing with probability and its applications has recently been discussed in terms 
of sponsorship other than The Institute of Mathematical Statistics. Such a journal would substantially reduce 
the number of papers on probability in the ANNALS, and the officers of the Institute regard the possible formation 
of the proposed journal with mixed feelings. 


The founding of such a journal in the near future seems altogether unlikely. Since it ee = will continue 
to be, important for the officers of The Institute of Mathematical Statistics to understand the ve strengths 
of the conflicting feelings of the members about various aspects of the contents of the ANNALS. Saiee, this 
discussion seemed to provide a good opportunity to take stock of these feelings. 


Accordingly, | ask those members who have definite feelings in this matter to express themselves as pro- 


vided below, and to send their expressions to me at the temporary address indicated, The officers of the Institute 
can only be responsive to its membership if that membership speaks its mind, 


Mail to: John W. Tukey, 2245 Page Mill Road, Palo Alto, California 


The effect of the following changes in the content of the ANNALS OF MATHEMATICAL STATISTICS on 


my relation to The Institute of Mathematical Statistics would probably be: 


Danger of Affiliation No Affiliation 
Resignation Change Stronger 


More papers about probability Cee 
Very few papers about probability Cc asead 


More emphasis on directly applicable 

mathematical statistics CJ Cc 
Less emphasis on directly applicable 

mathematical statistics Cael et 
More space devoted to tables Cres 
Less space devoted to tables es ere 


COMMENTS: 





1240 NEWS AND NOTICES 


The numbers of replies in terms of such scores were as follows: 


Score Probability Dir. appl. Tables 
+2 129 
+1 81 
0 138 
—1 5A 
—2 28 


Mean Score..... | +0.53 +1.09 


When the response in terms of these scores is broken down in terms of the 
most obvious characteristics of those replying, the observed mean scores and 
certain of their standard errors are as in Table 1. While some of the tendencies 
are in directions which might have been expected, none of the differences are 
statistically significant. (The most noticable appearances are a greater perference 
for more directly applicable mathematical statistics by “none of the above” as 
compared to “fellows of IMS’, and a greater preference for more space devoted 
to tables by “have published two or more papers in the Annals” as compared 
with “Eastern hemisphere addresses” (all 3 reports common to these two groups 
are neutral). These selected differences reach 2.3 and 1.3 times their standard 
error, respectively.) 

In view of the editorial policy of the Annals, which is to accept good papers 


TABLE 1 


Means and standard errors of difference responses 
(Detailed breakdowns and totals in parentheses) 


Respondents | Probability Directly applicable | 


36 Eastern hemisphere 
addresses* 0.444 .24 | 1.08 + .21 —.11 + .17 
48 Fellows of IMS* 0.584 .16 | O814 4 | —.024 .15 
114 Annals authors* 0.59 + .10 | 0.94+ 11 | +.06 + 11 
(75 with two or more papers) (0.53) (0.93) } (+ .16) 
(38 with one paper only) (0.71) (0.95) (— .12) 
(91 with paper(s) since 1950) (0.56) (0.89) | (+.03) 
(23 with no paper since 1950) (0.62) (1.13) (+ .09) 
(73 not Fellows) (0.55) (1.05) (+ .05) 
7 Unsigned (1.00) | (0.86) (—.14) 
274 None of above 0.554 07 | 117+ 07 | +.09+ .07 
(430 Total) | (0.53 + .04) | (1.09 + .06) | (+.05 + .05) 
(First 213 received) (0.55) (0.94) (— .04) 


i 








* Overlaps: 41 Fellows are Annals authors, 7 Eastern hemisphere addresses are Annals 
authors, 3 Eastern hemisphere addresses are Fellows, 2 persons fall in all three categories. 

D/A.41 Author-Fellows 0.72, 73 non-Fellow authors 1.05, 28 non-author, non-Fellow, 
eastern hemisphere 1.11, 274 others 1.17. 





NEWS AND NOTICES 1241 


without regard to their emphases within the domain covered, it is clear that the 
authors who write for the Annals have the greatest influence on the way its 
space is divided. Since the expressed views of authors are in the same directions, and 
of approximately the same strengths, as those of the general membership, it would 
seem to be up to these same authors to readjust the emphasis of the Annals by sub- 
mitting more of the kinds of papers they would like to see themselves. 


3. Correlations. When the scores used in the analysis of the previous section 
were replaced by the means of the corresponding sections of a unit normal distri- 
bution [1], viz., 


Score used in correlation computation 


Score for Section 2 
Dir. appl. 
1.16 75 
.28 — .32 
— .25 — .88 
—1.16 —1.42 
—1.95 —2.02 


and the results correlated, the following results were found: 
Probability vs. Directly applicable r= —.32 
Probability vs. Tables r= —.20 
Directly applicable vs. Tables r= +.77 
Since the largest value for the last r consistent with the marginals is 0.83, the 
correlation between preference for directly applicable mathematical statistics 
and preference for tables, while not surprising in sign, is somewhat surprising in 
strength. (This fact needs to be considered in company with the fact that the 
overall mean response to tables is neutral, while that to directly applicable 
mathematical statistics is strongly favorable.) 


4. Dangers of resignation. There is food for thought in the following break- 
down of the number of respondents checking a particular question ‘danger of 
resignation” ; 


Number checking this ‘danger of resignation” 


as one of two or three alone 


——————— susan 
6 5 
6 13 
4 1 
1 23 
6 
1 








1242 NEWS AND NOTICES 


The discrimination between questions is clearly much better for those checking 
only one danger of resignation. Except for a greater sensitivity toward more 
tables rather than fewer, the results for these, most specifically critical, respon- 
dents parallel those obtained in Section 2. 


5. Summary of comments. Since no specific questions were being answered, 
the numbers of responses of a given sort are not easy to interpret. 

Journals: 15 favored a separate probability journal, 4.5 more if IMS were to 
be the sponsor, 11.5 were against a separate probability journal (one individual 
scored 0.5 in each of two classes). (In general, those whose affiliation with IMS 
would be strengthened by more probability in the Annals tended to favor a 
separate journal.) 1 each favored (i) a journal intermediate between Annals and 
JASA, (ii) a separate journal for statistical mathematics. 

Directly applicable mathematical statistics: 2 felt that it would be impossible to 
have less, 1 that increase would lead to more UK members. 

Tables: 8 felt they should appear elsewhere, 1 more felt extensive tables should 
appear elsewhere, 3 felt they should be separately issued by IMS. 1 that Annals 
should accept all good tables. 1 that machine computed tables should be pho- 
tographically reproduced to avoid errors. 

Broad editorial policy of the Annals: 11 favored present standards and policy, 
7 wanted higher standards, 2 each said (i) don’t over-specialize, (ii) theory, not 
applications, (iii) accept all good papers. 1 each wanted (i) acceptance of papers 
even distantly related to statistics, (ii) concentration on original ideas, (iii) all 
papers on probability to be accompanied with statistical applications. (iv) no 
papers, only descriptive summaries, (v) page charges eliminated. 

More emphasis was sought on: expository papers and surveys (6), statistical 
inference (2), papers not requiring highest mathematical sophistication (1), notes 
(1), papers like 30’s and early ’40’s (1), statistics (and less mathematics) (1), 
stochastic models for physics in language intelligible to physicists (1), greater 
similarity to Biometrika and JRSS, Series B (1). 

Less emphasis on: specialized experimental design was sought by 4 respondents. 

Space and readability: 3 asked for more space; 3 for more readable, less concise 
papers; 8 for easier transition to application. 

Current Annals papers: 1 each felt that (i) there are too many sterotyped 
papers, (ii) some papers belong in “‘orbit”’, (iii) ‘‘a collection of sterile and useless 
exercises’’, (iv) there should be less ‘“‘thesis work’’. 

Operating practices: 3 suggested 6 issues a year, | that reprints should be 
available, 1 that there should be fewer misprints and errors. 


6. Summary. The present and past Editors of the Annals can take pride in the 
generally favorable response, and in the clear fact that much of the responsibility 
for the changes in Annals emphasis which the overall membership desires falls 
upon authors. This responsibility includes both readjustment of emphasis and 
more expository papers. 





NEWS AND NOTICES 1243 


REFERENCES 


{1] Hotus M. Leverett, “Table of mean deviates for various portions of the unit normal 
distribution,” Psychometrika, Vol. 12 (1947), pp. 141-152. 


ro 


REPORT OF THE PRESIDENT FOR 1960 


Most details of the affairs of the Institute will be covered by reports of other 
officers. The responses to the questionnaires about the Annals are reported else- 
where in this section. It remains for me to mention our Council action of special 
interest, to announce the names of the nominating committee, and to explain 
how this has been a year of continuation and preparation for the Institute. 

1961 Annual Meeting: The Council debated the place and time of the next 
annual meeting very carefully. The desires of the membership, as expressed by mail 
ballot a few years ago, to have only one nation-wide meeting, and to meet alter- 
nately, so far as possible, with the American Mathematical Society and the 
American Statistical Association, were judged by the Council to be controlling. 
Asa result, the next annual meeting of the Institute will be in Seattle, Washington, 
during the week of 12 June 1961, a week in Seattle which will include a special 
AMS symposium on convex sets, and regional meetings of AMS and MAA. (No 
reasonable solution to a meeting with AMS in 1961 east of the Mississippi could 
be found.) The chairman of the program committee for this meeting will be 
David L. Wallace. F. J. Anscombe, J. Blum, E. L. Crow, J. L. Folks, R. Gnana- 
desikan, A. T. James, M. R. Mickey, R. G. Miller, R. Radner, and G. 8. Watson 
have been invited to serve with him. I urge all members of the Institute to help 
this committee make the 1961 Annual Meeting an outstanding success. 

Nominating Committee for the 1961 Election: This consists of M. B. Wilk (chair- 
man), L. T. B. Brown, W. G. Cochran, D. M. Gilford, J. L. Hodges, P. R. Meier, 
E. 8. Pearson, and W. L. Smith. 

The Annals: The Annals continues to grow in size. The reactions of the mem- 
bership to the relative emphasis it has placed on various aspects of our subject 
have been assessed, with results which can only encourage the editor. 

Younger Members: A committee of younger members under the chairmanship 
of Walter L. Smith is re-examining the state of the Institute. Other steps have 
been taken to bring more younger members into the Institute’s activities. 

Activity Outside North America: The question of how the Institute may best 
serve that portion of its membership, present and future, outside North America 
and may best participate in international statistical activities has been studied, 
with results that seem likely to lead to action during 1961. 

Membership: The Committee on Institutional Members, under the chairman- 
ship of Mervin E. Muller, has had another extremely successful year. The time 
would now seem to be ripe for a very active drive for new individual members. I 
ask every IMS member to cooperate with the activities of the Membership 
Committee, both in 1961 and afterwards. 





1244 NEWS AND NOTICES 


Finances: At the present level of dues, neither the very substantial actual 
increase in institutional members, nor any possible increase in individual mem- 
bership would suffice to meet the increased costs of the Institute, which come 
mainly from the increased size of the Annals and the increased cost per page of 
printing it. The inauguration of page charges, a step taken reluctantly and after 
extended and careful consideration, should meet this financial crisis for the im- 
mediate future. 


Joun TUKEY 


a a 


REPORT OF THE TREASURER FOR 1960 


Since I am nearing the end of my tenure as Treasurer I recommended 
a thorough audit of our operation; the firm was selected some months after the 
close of the 1959 calendar year, and the auditors urged an audit of an eighteen 
months’ period ending June 30, 1960. The audit was completed early this month 
and I am able to present an up-to-date account of our financial position. 

In general the trend of the last few years has continued with small revenue 
increases more than offset by larger expenses, principally as a result of con- 
tinued large issues of the Annals and ever-increasing printing costs. At the 1959 
Annual Meeting a series of steps were proposed to reverse this trend; and the 
following propositions were approved: 


Subscription rates to be increased from $12. to $15. for U.S. 
and Canada and from $10. to $12. for foreign. 

Back Issue rates increased from $12. to $15. per volume and 
$3.50 to $4. per issue beginning with Volume 27. Advertising 
would be accepted in the Annals 


The increased rate for Back Issues was effective for 1960, but the subscription 
rate increase does not become effective until Volume 32 (1961). The shift in 
policy on the acceptance of advertising has been disappointing financially. A 
policy of page charges, also to be effective beginning in 1961, was approved; 
but as this report is being prepared before the Council meeting I am not able to 
present the detailed operation at this time. 

A quick review of the Revenue and Expense Summary shows that dues income 
is continuing to climb. It should be noted that the work of the Committee for 
Institutional Members has been most successful and that we now have 45 Insti- 
tutional members. We had a continuing increase in subscriptions and our 
investment income was also up. The sales of Back Issues continue to reflect that 
the later more expensive volumes are being sold to a greater extent and this 
results in less net income. 

On the expense side of the ledger, cost of the Annals reflects another increase 
in printing costs. The higher salary expense results from continuing studies of 





NEWS AND NOTICES 1245 


means to get the Institute on at least a break-even basis and also the result of 
increased membership and subscriber activity. The editorial expenses are also 
up following approval of the editor’s request for more assistance. 

The cumulative deficit for 1958, 1959 and 1960 will be $20,781. Of this, $18,000. 
will be paid for by a NSF publication subsidy grant, leaving a cumulative deficit 
of $2,781. for the last three years. 

The budget for 1961 is our forecast following the institution of new policies 
except for page charges. The biggest income increase results from increased sub- 
scription rates based on the assumption that not many subscribers will cancel 
as a result of the rate increase. We have been told by the printer that the 1960 
prices will hold through 1961 so the printing expense will be dependent upon the 
size of Volume 32. This budget is based on an Annals of 1,300 pages, as the editor 
reports that there is an unusually large number of papers circulating through the 
refereeing channels. Even these actions fail to increase revenue enough to get us 
back in the black. 

A Balance Sheet as of June 30, 1960 is also attached. 

Apert H. Bowker 





‘S801q Aueerll }8 puvy uo A1ojueAUT [woIsAyd Jo U0NV0I100 ; 


“(00's oe ¢) (00°86 “268 D - “998‘F) “(29° 6r8" 6ie ‘s) £9° P29" T 7" sHurmsvg pourvjoy 


00° 289°% 00°928°% 00 029° o 00° 208°2 00°L1z‘Z : AsOVUGAUI SpwUUY 
£6 °LEL 0} UOTJIPPR 10} GAsI80Y 


(00°829°2) (00°21z‘9) (29° 861 ‘2) (#2 Z8L‘b) ‘98 asuedxe 12A0 ONUeAVY S890X] 








00° 822 1st 00° 290° ors ze" 646° rs 99°68 689" “1s 92° 189° ces 2 osuodxy [10,1 


00° 00g 00° Oct 00°99 00° Tg r8 ZIT "ee esuedxgy Surpurg 
me ~ 00° “£6 osusdxg Bursa 
00°06 0008 9g 69 61°28 r6°1L ‘§ xeL VOTA 
Te" FL1 00° L6E &2°829 9°68 [PABL], 
00992‘ T GZ'Z1z‘E 00° 082 ' asuedxy [VU0TpP| 
96° 928 ainjipuedxy [eyidey 
68° 621 £0° S81 88° 181 9° 002 suOTNgUyUO”D 
1% 2z8°% 69°9II'T 02 SES L801 ‘dx ao “ost 
-- ST 6E8*¢ 96° TIL't bE S61'E F soLElEg 
92° ess"T 88° 186°1 82° FE9'S “PbL' "980g ‘A198 “Bid “ost 
69° LrE' TE oO 1Sh "08 Ge" S29' FZ “LEL‘0 (quedmn9) speuuy 
sasuodxy 


SE 
BF 
23 
32 


: 
SERES & 


Barwa 
sesee 8 


Sssss 8 
S585 
=z 


L 
a 
: 
Z 
= 
Z 
< 
2 
= 
Z 


06° 0¢2‘1#$ Tr 208" 9e$ 68 ‘Eze 9ES 9g“ I8l' ZES onUdAGY [VIOT, 


Zz ZEI IF OFT ZL 1% 0'8Z 19440) 
19° OL‘ 62° 986'E LI e9s‘¢ "1290'S sanssy YUE Jo "yg 
ZL 082'E 60 °209‘Z CL OFZ ‘O88 I S}UDUTSOAUT WOIJ “OUT 
GE Z1Z ‘SI 29° ZOL‘Z1 CL 6F6 ‘TI “160‘TT suotyduosqny 

00° 006 ‘Z 00°008‘T oos‘T ‘OOr'T uoTpNgTysuy ‘senc] 

Sre‘T “SlF‘I 194) ‘Alpuy ‘sen 

(BpeuBy 

00° 661 ‘Sts 00°920‘FI$ 00° F6E “EI$ 696 ‘ZI$ ‘S$ “QO ‘Atpuy ‘seng 

enusasy 


00° 
00°018*T 00° 0g¢"T 00° 


6961 S861 L961 9861 
@ATVBIUAT, 
fsvmung ssuadrg puv anuraay 


SOLLSLLV.LS TVOLLVNAHLVW JO FLOLILLSNI 








NEWS AND NOTICES 1247 


INSTITUTE OF MATHEMATICAL STATISTICS 


Balance Sheet 
June 30, 1960° 


Assets 


Current Assets: 


Cash in bank—checking account $1,232.56 
Cash in banks—savings accounts 70,778.91 
Investments: 
U. 8. Government bonds—at cost $8,857.25 
Savings certificate 5,000.00 
Total investments 13 ,857 .25 
Accounts receivable: 
Dues $1,963.50 
Subscriptions 275.00 
Back issues of ANNALS 1,496.25 
Total receivables 3 75 
Inventory of ANNALS 24,236.80 


Total assets $113,840.27 
Liabilities and Surplus 


Liabilities 


Z- 


Accounts payable 

Payroll taxes payable 

Dues advanced by members 200 

Advanced subscriptions 469. 
96 
435 


ms 
— 
os 


Wald Royalties payable 
Grant from National Science Founda- 12, 
tion 
Total liabilities 13,377.24 
Surplus 
Reserve for life members $2,757.50 
Available for maintaining supply of 33,452.52 
ANNALS issued 
Available for general purposes 64,253.01 
Total surplus 100 , 463 .03 


Total liabilities and surplus $113 840.27 


' The income from subscriptions and memberships for the calendar year 1960 was sub- 
stantially realized in the first six months, while the majority of the expenses have not yet 
been incurred. The operations for the calendar year 1960 are expected to produce an excess 
of expenditures over receipts at least as large as the excess of expenditures over receipts 
for the calendar year 1959. 


a re —- 


REPORT OF THE EDITOR FOR 1960 


During the operating year August 1, 1959, to July 31, 1960, there were 206 
manuscripts submitted to the Annals, about a thirteen per cent increase over the 








1248 NEWS AND NOTICES 


level during the preceding three years. Final decisions were made for 160 manu- 
scripts during the 1959-60 operating year, and 179 manuscripts were under 
editorial consideration on July 31, 1960. The size of the printed volume for 1960 
is 1,254 printed pages, slightly under the number authorized by the Council. 
There is no backlog, so that all accepted manuscripts are sent to the printer as 
soon as possible. 

A detailed statistical report of Annals operations in 1959-60 will be sent to 
interested members on request. 

During the year, paid advertising was instituted. Page charges will be insti- 
tuted shortly. A cumulative Index-Guide to the Annals is to be prepared at the 
University of Minnesota, under the direction of I. Olkin and I. R. Savage. 

I am grateful to the Associate Editors for their time-consuming and effective 
work. Mr. D. L. Wallace acted as Editor during a two month period when I was 
absent from my office; I wish to express special thanks to him. Mrs. Cynthia 
Zilliac, and later Mrs. Juanita Isherwood and Mrs. Doris Jacques, have labored 
devotedly and efficiently with the multitude of typographical, clerical, and 
miscellaneous tasks of the Editor’s office; I am very grateful to them. I also 
thank the University of Chicago for its continued material aid. 

Finally, I have the pleasure of listing the names of referees of papers for which 
final editorial decisions have been made during the period February 1960 to 
September 1960, inclusive. Annals referees have great responsibilities for the 
choice of manuscripts to be published and for the revision of manuscripts towards 
greater accuracy and clarity. Authors, readers, and members of the editorial 
staff should all be grateful for the generous work of the referees. 


Abbott, J. H. 
Anderson, T. W. 
Andrews, Fred 
Anscombe, F. J. 
Bellman, Richard 
Billingsley, Patrick 


Dwass, Meyer 
Ellison, B. E, James, G. 8. 

Fano, R. M. Johnson, N. L. 
Feldman, Jack Karlin, 8. 

Fix, Evelyn Katz, Leo 

Foster, F. G. Kempthorne, Oscar 


James, Allan 


Birnbaum, Allan 
Blackwell, David 
Blum, Julius R. 
Blumenthal, R. M. 
Breiman, Leo 
Burkholder, D. L. 


Chapman, Douglas G. 


Chernoff, Herman 
Connor, William 8. 
Craig, C. C. 

Daly, Joseph 
Daniels, H. E. 
David, Herbert T. 
Dempster, A. P. 
Derman, Cyrus 
Dixon, W. J. 
Donsker, M. D. 


Fraser, D. A. S. 
Gardiner, D. A. 
Geisser, Seymour 
Ghurye, 8. G. 
Good, I. J. 
Goodman, Leo A. 
Gupta, 8. 8. 
Gurland, John 


Hall, Marshell, Jr. 
Hammersley, J. M. 


Hannan, James 
Hansen, Morris 
Harris, T. E. 
Hartley, H. O. 
Herbach, Leon 
Hunter, J. 8. 
Hutton, E. J; 


Kendall, M. G. 
Kensten, Harry 
Kiefer, Jack 
Kraft, Charles H. 
Kullback, 8. 
Laha, R. G. 
Lamperti, John 
LeCam, Lucien 
Lehmann, E. L. 
Lukacs, Eugene 
Madansky, A. 
Mandelbrot, Benoit 
Mauldon, J. G. 
Miller, Rupert 
Moore, P. G. 
Moses, Lincoln E. 
Nadler, Jack 








NEWS AND NOTICES 1249 


Noether, Gottfried Savage, I. R. Weiss, Lionel 
Owen, Donald B Savage, L. J. Welch, B. L. 
Olkin, Ingram Sitgreaves, Rosedith Whittle, Peter 
Parzen, Emanuel Smith, Walter Wijsman, Robert A. 
Paulson, E. Steck, G. P. . 

Pillai, K. C. 8. Stein, Charles M. ene ee 
Pollaczek, F. Taylor, William Wolfowits, J. 

Pratt, John Teicher, Henry Young, D. H. 
Pyke, Ronald Teichroew, D. Zelen, Marvin 
Quenouille, M. F. Throckmorton, Neal Zyskind, George 
Ruben, Harold Tintner, G. 

Sargan, John Dennis Wallace, David L. October 1, 1960 
Saunders, 8. Watson, G. 8. William Kruskal, Editor 


a 


REPORT OF THE STANFORD, CALIFORNIA MEETING OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The eighty-fifth meeting of the Institute of Mathematical Statistics, the 
twenty-third annual meeting, was held at Stanford University, Stanford, Cali- 
fornia, on August 23-26, in conjunction with the meetings of the American 
Statistical Association, the Biometric Society (ENAR), the Biometric Society 
(WNAR), the Econometric Society, the Western Economic Association, and 
the Western Farm Economics Association. 


There were 294 members of the Institute registered for the meeting. The 
program of the meeting was as follows: 


TUESDAY, AUGUST 23, 1960 


8:30 a.m.—Invited Papers I: Times Series, Stochastic Processes (ASA and IMS) 


Chairman: M. Logve, University of California, Berkeley. 

1. ‘‘New Light on the Classical Problem of Decomposing an Observed Time Series into 
Trend, Periodic and Stationary Components,” J. Dunsin, Stanford University and 
University of London. 

2. “The Design of Experiments with Auto-Correlated Errors,’’ G. M. Jmnxins, Stanford 
University and Imperial “ollege, London. 

3. “On a Model of Queueing T' ory and its Application to the Problem of Market Equi- 
librium,” J. Luxasziewicz, Wroclaw University. 


4. “The Ranking Limit Problem for Markov Chains,’ R. Cocpurn, University of Cali- 
fornia, Berkeley. 


8:30 a.m.—Contributed Papers I 


Chairman: D. Gurnee, Jr., Stanford Research Institute. 

1. “The Use of Sample Quasi-Ranges in Setting Confidence Intervals for the Population 
Standard Deviation,” F. C. Leone, Y. H. Rutenserc, ann ©. W. Torr, Case 
Institute of Technology. 

2. “Expected Values of Normal Order Statistics,’ H. L. Hanrer, Air Force Research 
Division, Wright-Patterson Air Force Base. | 

3. “Two Sample Nonparametric Tests for Scale Parameter’ (Preliminary Report), J. 
Kotz, University of California, Berkeley. 








1250 NEWS AND NOTICES 


4. “Distribution of Quantiles in Samples from a Bivariate Population,”’ M. M. Sippiqvt, 
Boulder Laboratories, National Bureau of Standards. 

5. “On the Distribution of the Ratio of the Largest of Several Chi-Squares to an Independent 
Chi-Square with Application to Ranking Problems,’’ S. S. Gupta anp M. Sope, 
Bell Telephone Laboratories, Inc. 

6. “On the Non-Null Distribution of the Studentized Difference between the Two Largest 
Sample Values’’ (Preliminary Report), A. Croreav aNnp J. St. Prerre, Université 
de Montreal. 


10:30 a.m.—-Invited Papers II: Statistics (ASA and IMS) 


Chairman: J. W. Tuxey, Princeton University and Bell Telephone Laboratories. 
1. “‘Successive Process of Statistical Controls,’ T. Kiracawa, Kyusyu University. 
2. ‘‘Likelihood as a Basic Concept in Inductive Inference,’’ A. Brirnpaum, New York 
University. 
3. ‘‘Non-parametric Several-Sample Tests,’ F. C. ANprews, University of Oregon. 


2:00 p.m.—Special Invited Paper 


Chairman: K. L. Cuuna, Syracuse University. 
‘Statistical Methods in Markov Chains,’’ P. Bittincsiey, University of Chicago. 


3:00 p.m.—Invited Papers III: Information Theory 


Chairman: D. Buackwe.., University of California, Berkeley. 
1. “A Caleulus of Information,’’ B. McMi.uan, Bell Telephone Laboratories. 
2. ‘‘Another Approach to Information Theory,’ L. Bremman, University of California, 
Los Angeles. 
3. ‘On the Quantity of Information of Kolmogorov,’’ 8S. P. Ltoyp, Bell Telephone Labora- 
tories. 


3:00 p.m.—Contributed Papers II 


Chairman: M. V. Jouns, Jr., Stanford University. 
1. “Approximations to Neyman Type A and Negative Binomial Distributions in Practical 
Problems”’ (Preliminary Report), 8. K. Karri, Florida State University. 
2. ‘‘Elements of the Sequential Design of Experiments,’ 8. A. Bessier, Sylvania Elec- 
tronic Defense Laboratories, Mountain View, California. 
3. “On a Property of a Test for the Equality of Two Normal Dispersion Matrices Against 
one-sided Alternatives,’’ W. F. Mixuait, University of North Carolina. 
. “Two New Continuous Sampling Plans.’’ J. S. Wuire, General Motors Technical 
Center, Warren, Michigan. i 
. “Power Characteristics of the Control Chart for Means,’’ F. A. Sorensen, United 
States Steel Corporation Applied Research Laboratory, Monroeville, Pennsylvania. 
. ‘A Note on Simple Sampling Plans,’’ T. V. Narayana anv 8. G. Monanrty, Uni- 
versity of Alberta. 


4:00 p.m.—Spectral Analysis of Time Series (ASA and IMS) 


Chairman: J. M. Campron, National Bureau of Standards. 
1. “General Considerations in the Estimation of Spectra,’’ G. M. Jenkins, Stanford 
University. 
2. ‘Mathematical Considerations in the Estimation of Spectra,’ E. Parzen, Stanford 
University. 





1 


3 


4 





4. 


5. 


2. 


2. 
3. 





NEWS AND NOTICES 1251 


Discussion: 


J. W. Tuxey, Princeton University 


D. E. Zitmer, Naval Ordnance Test Station, China Lake 


N. R. Goopman, Space Technology Laboratories 


7:00 p.m.—1960 Council Meeting 
8:00 p.m.—Committee on Mathematical Tables 


WEDNESDAY, AUGUST 24, 1960 


8:30 a.m.—Invited Papers IV: Discriminant Functions and Classification Tech- 
niques (ASA, BS, and IMS) 


Chairman: A. H. Bowker, Stanford University. 


. “The Theor y of Discriminant Functions and Classification Techniques ,’’ R. SiTaREA VES 
Columbia University. 


‘Reduction of Variates Relative to Classification,’’ R. H. Suaw, LB.M. Research 
Center. 

. “On the Generalization of Classification Techniques,’’ C. F. Kossack, I B.M. Research 
Center. 


. “A General Computer Program for Multivariate Analysis of Variance with Special 
Reference to the Multiple Discrimination Problem,” L. B. Jones, University of 
North Carolina. 


8:30 a.m.—Contributed Papers III 


Chairman: D. Hauey, Acadia University, Nova Scotia, and Stanford University. 
‘ 


‘Best Fit to a Random Variable by a Random Variable Measurable with Respect to a 
o-lattice,’’ H. D. Brunx, University of Missouri. 

‘‘Random Noise in Relay Control Systems,’’ R. C. Davis, Convair/Pomona. 

‘Estimating the Infinitesimal Generator of a Finite State, Continuous Time Markov 
Process,’’ A. ALBERT, Columbia University. 

‘On a Class of Covariance Kernels Admitting a Power Series Expansion"’ (Preliminary 
Report), N. D. Yivisaxer, Columbia University. 

‘Phase Interpretation of Coherence,’’ A. Suaptro, Bell Telephone Laboratories. 


11:00 a.m.—Special Invited Paper 
Chairman: C. Stein, Stanford University. 


“‘Fiducial Probability,’’ D. A. 8. Fraser, University of Toronto. 


2:00 p.m.—Invited Papers V: Applied Probability 


Chairman: 8. Karun, Stanford University. 


1. 
2. 


3. 
. “Continuous Time Storage Systems with Linear Inputs and Outputs,’ R. Miccer, 


‘Some Problems of Dams with Ordered Inputs,’’ J. Gani, University of Western Aus 
tralia. 

“The Solution of Queueing and Inventory Models by the Theory of Semi-Markow Proc 
esses,’’ A. J. Fasens, Dartmouth College. 

“Queues in Series,’’ J. Sacks, Columbia University. 


Stanford University. 














1252 NEWS AND NOTICES 


4:00 p.m.—I.M.S. Business Meeting 
9:00 p.m.—Gateway Singers—-Stanford University, Host 


THURSDAY, AUGUST 25, 1960 
8:30 a.m.—Invited Papers VI: Probability 


Chairman: J. Neyman, University of California, Berkeley. 
1. “On Invariant Probability Measures,”’ J. R. Buum, Sandia Corporation. 
2. “‘A Representation of the Bivariate Cauchy Distribution,”’ T. 8. Ferauson, University 
of California, Los Angeles. 
3. “On Random Walks in the Plane,’’ G. E. Baxter, University of Minnesota. 


4. “The Law of Large Numbers in Banach Spaces,’’ A. Beck, University of Wisconsin 
and Cornell University. 


10:30 a.m.—Invited Papers VII: Multifactor-Multiresponse Experiments—Con- 
tinuous Responses, Normal or Non-Normal (ASA, BS, and IMS) 


Honoring Harold Hotelling at his 65th Birthday 


Chairman: D. M. Giirorp, Office of Naval Research. 
1. “General Introduction,’’ S. N. Roy, University of North Carolina. 
2. “One Degree of Freedom Plots in Multi-Response Factorial Experiments,’’ R. GNANADE- 
SIKAN AND M. B. Wik, Bell Telephone Laboratories. 
3. “Continuous Responses, Not Necessarily Normal, with Applications,’’ R. BARGMANN, 
Virginia Polytechnic Institute. 
4. “Applications in Psychometry,’’ R. D. Bocx, University of North Carolina. 


2:00 p.m.—Special Invited Paper (ASA and IMS) 


Chairman: H. Hore.uina, University of North Carolina. 
“A Survey of Time-Series Analysis,’ E. Parzen, Stanford University. 


3:00 p.m.—Invited Papers VIII: Statistics 


Chairman: F. ANprEews, University of Oregon. 
1. “Admissible and Optimal Designs in the Presence of Nuisance Parameters,’’ G. ELrvinG, 
University of Helsingfors. 
2. ‘Tolerance Regions,’’ 1. Gurrman, McGill University, Montreal. 
3. “Some Comparisons Among Different Types of Random Allocation Designs,"’ A. P. 
Dempster, Harvard University. 


4. “Estimating a Mized Exponential Response Law,’ F. J. ANscomspe, Princeton Uni 
versity. 


3:00 p.m.—Contributed Papers V 


Chairman: A. W. Marsnatu, Stanford University. 

1. “Expansions for Convolutions,’’ R. Dawson, American Systems Incorporated. 

2. “Maximal Independent Stochastic Processes,’ C. B. Beti, Jr., San Diego State 
College. 

3. ‘Normal Approximation to the Chi-square and Non-central F Probability Functions,”’ 
N. C. Severo anp M. Zeien, University of Buffalo and National Bureau 
of Standards. 

4. “Estimation of the Scale Parameter in the Weibull Distribution by Means of a Life 
Test Censoring by Both Time and Number of Failures,’’ E. H. Leuman, Jr., Univer- 
sity of North Carolina. 





NEWS AND NOTICES 


5:00 p.m.—1961 Council Meeting 
10:00 p.m.—Informal Party (All Societies) 


FRIDAY, AUGUST 26, 1960 
8:30 a.m.—Contributed Papers IV 


Chairman: I. OLx1n, University of Minnesota. 
1. “Sequential Model Building for Prediction in Regression Analysis, I’’ (Preliminary 
Report), H. J. Larson anv T. A. Bancrort, Iowa State University. 
2. ‘On Sampling with Varying Probabilities and with Replacement in Sub-Sampling 
Designs,’’ J. N. K. Rao, Iowa State University. _. 
3. ‘A Calculus for Factorial Arrangements,’’ M. Zeten anv B. Kurxsian, National 
Bureau of Standards and Diamond Ordnance Fuze Laboratories. 
. ‘Three Quarter Replicates of 2* and 2* Designs,’’ P. W. M. Joun, California Research 
Corporation, Richmond, California. 
. ‘Alias Sets of Error Vectors in the Theory of Error Correcting Group Codes,"’ R. C. 
Boss, University of North Carolina and Case Institute of Technology. 
. “Some Results on Transformations in the Analysis of Variance,’ M. M. Rao, Carnegie 
Institute of Technology. 


10:30 a.m.—Invited Papers IX: Multifactor-Multiresponse Experiments (con- 
tinued)-Categorical or discrete responses 
Honoring Harold Hotelling at his 65th Birthday 


Chairman: H. Sotomon, Stanford University. 
1. “General Introduction,’ 8. N. Roy, University of North Carolina. 
2. ‘‘Asymptotic Power of the Tests,’’ E. Diamonp, Johns Hopkins University. 


Contributed Papers Presented by Title: 


1. “‘Zero Correlation and Independence,”’ H. O. Lancaster, University of Sydney, 
Australia. 

2. “On the Generalization of Sverdrup’s Lemma and Its Applications to Multivariate 
Distribution Theory,’’ D. G. Kane, Karnatak University, Dharwar. (Introduced 
by B. D. Tixxrwat). 

. “On the Unbiasedness of Yates’ Method of Estimation Using Interblock Information,”’ 
F. A. GrayBILu AND V. Sesnapni, Oklahoma State University. 

. “Minimal Sufficient Statistics for Eisenhart’s Model II in a Class of Two-way Classifi- 
cation Models,” D. L. Werxs anv F. A. Graysitt, Oklahoma State University. 

. “Minimal Sufficient Statistics for the Two-way Classification Mized Model Design,” 
R. A. Hutrquist anp F. A. Grarsiit, Oklahoma State University. 

5. ‘A Set of Sufficient Statistics for Variance Components in a Two-way Classification 
Model with Unequal Numbers in the Subclasses,’ D. L. Wenxs anv F. A. GRaYBILL, 
Oklahoma State University. 

. “Sample Size for a Specified Width Confidence Interval on the Variance of a Normal 
Distribution,’ F. A. Graypitt anp R. D. Morrison, Oklahoma State University. 

. “Limiting Distribution of the Maximum in an Infinite Sequence of Exchangeable Random 
Variables,”” S. M. Berman, Columbia University. 

. “The Covariance Function of a Simple Trunk Group, with Applications to Traffic 
Measurement,’’ V. E. Benes, Bell Telphone Laboratories and Dartmouth College. 

. “The Sequential Design of Experiments for Infinitely Many States of Nature,’’ A. 
A.Bert, Columbia University. 





NEWS @ND NOTICES 


. “On a Geometrical Method of Construction of Cyclic PBIB” (Preliminary Report), 
E. Seren, Northwestern University. 

. “Cireular Error Probabilities,’’ H. L. Harter, Air Force Research Division, Wright - 
Patterson Air Force Base. 

. “Comparison of Normal Scores and Wilcozon Tests,’’ J. L. Hopes, Jr. anv E. L. 
Leamann, University of California, Berkeley. 

. “‘Mazimal Independent Stochastic Processes,’ C. B. Beix, Jr., San Diego State 
College. 

. “On Sufficient Conditions for Consistent Parameter-Estimates in a Stochastic Difference 
Equation with Regression on Several Lagged and Non-Stochastic Variables,” F. 
Ercxsr, University of North Carolina. 

. “Multivariate Extremal Distributions,’ E. J. Gumpe., Columbia University. 


re  ——— 


PUBLICATIONS RECEIVED 


Anuario Estadistico de Espafia, 1960 Edicion Manual, Presidencia del Gobierno, Instituto 
Nacional de Estadistica, Ferraz 41, Madrid, Spain, 1001 pp. 

Finney, D. J., An Introduction to the Theory of Experimental Design, University of Chicago 
Press, Chicago, 1960, 223 pp., $7.00. 

Handbook on Data Processing Methods, Part I, Provisional Ed., United Nations Food and 
Agriculture Organization, Rome, 1959, 111 pp., $1.00 or 5s. 

Khintchine, A. Y., Mathematical Methods in the Theory of Queueing, No. 7 of Griffin's Sta- 
tistical Monographs and Courses, Ed. M. G. Kendall, Hafner Pub. Co., New York, 
1960, 120 pp., $5.50. 

Proceedings of the Symposia in Applied Mathematics, Volume 10, ‘‘Combinatorial Analysis,’’ 
American Mathematical Society, Providence, R. I., 1960, 311 pp., $7.70. 


Runnenburg, J. Th., On the Use of Markov Processes in One-Server Waiting-Time Problems 
and Renewal Theory, Ph.D. thesis, University of Amsterdam, Klein Offsetdrukkerij 
Poortpers N. V., Amsterdam, 1960, 139 pp. 

Youden, W. J., Statistical Design, (a collection of articles by W. J. Youden, reprinted from 
Industrial and Engineering Chemistry), American Chemical Society, Washington, 1960. 
Copies may be obtained from Reprint Department, ACS Applied Publications, 1155 
Sixteenth St., N.W., Washington 6, D. C., for $2.00. 





ADVERTISING IN 


THE ANNALS of 
MATHEMATICAL STATISTICS 


ADVERTISEMENTS for books, recruitment of professional 
personnel, etc., may now be placed in the Annals of 
Mathematical Statistics. Only full-page and half-page 
advertisements will be accepted. For details about 
costs, deadlines, sizes, and so on, please write to 


Mr. Edgar M. Bisgyer 
Advertising Manager 
American Statistical Assn. 
1757 K Street, N.W. 
Washington 6, D. C. 








JOURNAL OF THE 


ROYAL STATISTICAL SOCIETY 
Series B ( Methodological) 
Vol. 22, No. 2, 1960 Ann. Sub. incl. postage 


£3 2s Od USS. 
CONTENTS 


Models in the Analysis of Variance. By R. L. Puackerr (With Discussion). 
Discrete Stochastic Processes in Population Genetics. By W. F. Boomer (With Discussion) 
A Problem of Delayed Service—I. By A. Ben-Ienart anv P. Naor. 
A Problem of Dela Service—II. By A. Ben-Isnaet anv P. Naor. 
On the Transient Behaviour of a Simple Queue. By P. D. Fincu. 
Guegeins at a Single Serving Point with Group Arrival. By B. W. Conotty. 
Some Extensions of Ba Inference pro by Mr. Lindley. By R. A. Fisuen. 
A memes ea of Missing Values in Multivariate Data Suitable for use with an Electronic Com- 
puter Y ucK. 
Estimation of Parameters of a Multivariate Normal Population from Truncated and Censored Populations. 
By NAUNIHAL Sinon. 
Restrictions for Distributions a pusteriori. By D. A. Sprorr 
3 Evidence, Corroboration, Explanatory Power, Information and the Utility of Experiments. Br 
700D. 
istic Process: Tables of the Stochastic Epidemic Curve and Applications. By Epwin Mansrrecp 
anp Carvton HENaLeyY. 
Some Notes of Pistimetric Inference. By A. D. Roy. 
Evaluation of Determinants, Characteristic Equations and their Roots for a Class of Patterned Matrices 
By 8. N. Roy, B. G. Greenseno anv A. E. Sannan. 
Bounds for the Expected Sam Sample Size in a Sequential Probability Ratio a By M. N. Guosu 
A Note on Tests of Homogenei pplied after Sequential Sampling y D. R. Cox. 


ity A 
The Interaction Algorithm and Practical Fourier Analysis: an Siitian By I. J. Goon. 
Spereninate Solutions of Green’s Type for Univariate Stochastic Processes. By H. E. Daniets 
The Wileoxon Test and Non-null Hypotheses. By G. B. Wernexns. 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 


BIOMETRIKA 
Volume 47, Parts 3 and 4 Contents December 1960 


Memoirs: 


Lestiz, P. H. & Gower, J.C. The poopertian of a stochastic model for the pooteten: -prey t interaction 
between two species. Buom, Gunnar. Hierarchical birth and “- yg aes Gunnar. 
Hierarchical birth and death processes II. A ao, Gusnn, W. A. A com be er the effectiveness of 
tournaments. Pearce, 8. C. Seoperen Fouxs, Joux Leroy ee Oscar. The 
efficiency of blocking in incomplete block designs. Hatont, Franx A. Queueing with balking, I Il. me: 
H. A. & Perez, Carmen A. On comparing different tests of the same nopethesie, Faaurs, D 

formance of some correlation coefficients for a general bivariate distribution. McFappen, J. A. aoae a 
sions for the qunezernte normal in . Lana, R. G. & Loxacs, E. On a problem connected with quad- 
ratic McCut.oven, Rocer , GuRLAND, Joun & Rosenserc, Lioyrp. Small sample behaviour 
of n ats ‘of the hypothesis under variance heterogeneity. Sc SUKHATME, Batxrisuna V. Power of some 
two-sample non-parametric tests. Ewan, W. D. & Kemp, K ng inspection of continuous processes 
with no autocorrelation between successive results. BLYTH, Count & Horcninson, Davin W. Table of 
Neyman-shortest unbiased confidence intervals for the binomial parameter. Bennett, B. H. & Hav, P. 
On the power function of the exact test for the 2 x 2 contingency table. 

Suapson, J. A. & Wercn, B. L. Table of the bounds of the probability integral when the first four moments 
are given. Szvero, Norman C. & Zecan, Marvin. eran ap roximation to the chi-square and non-central 
F probability functions. Barron, D. E., Davin, F. & O'’Nenu1, Asap F. om, EA Table for of the distri- 
bution of the logarithm of non-central F. "Linvuey, D. \ East, D. A. & Hamittron A. Tables for maki 
inferences about the variance of a normal distribution. ‘Barto, D D. * Davw, F. N. & Meratnoron, 
Tables for the solution of the eo equation, exp (—a) + ka = 1. 

Miscellanea: Contributions by G. E. Banpwett, B. R. Buat anv J. Gaon, B . R. Cox, R. N. Cogwow, N. 
Gusert, W. M. Harper anv J. A. Macponatp, N. L. Jomnson anv D. H. Youne, M. G. Kenpaut, A. N. 
Ksnresacar, H. Linnart, T. A. Ramasvesan. 

Reviews: 

Other Books Received: The subscription, payable in advance, is 54/- (or $8.00) per volume (including postage). 
Cheques should be made payable to Biometrika, crossed “a/c Biometrika Trust” and sent to The Secretary, 
Biometrika Office, University College, London W.C.1. All ae cheques must be drawn on a Bank havi 

a London agenc ge Members of the Institute of Mathematical Statistics may subscribe to Biometrika aooaal 
the Institute, which will accept subscriptions to Biometrika at the special rate of $6.50 if paid to the Treasurer 
of the Institute before March Ist. 


Issued by THE BIOMETRIKA OFFICE, University College, London 



















JOURNAL OF 
THE AMERICAN STATISTICAL ASSOCIATION 


Volume 55 December, 1960 Number 292 


On Finite Sample Distributions of GCL Identifiability Test Statistics 
Robert L. Basmann 
A Note on the Limiting Relative maienay of the Wald epee: Senne Ratio 
Test E. Bechhofer 
Bivariate Exponential Distributions ! ed et J. Gumbel 
On the Exact Variance of Products Leo A. Goodman 
Circular Error Probabilities H. Leon Harter 
On Conditional Expectations of Location Statistics. . 2 Robert V. Hogg 
Effect of Bias on Estimates of the Circular Probable Error. . P. B. Moranda 
Market Growth, Company Diversification and Product Concentration 1947-1954 
Ralph L. Nelson 
Unbiased Ratio Estimators in Stratified Sampling .. Jose Nietro de Pascual 
A New Binomial Approximation for use in Sampling from Finite wr - 
. J. Sandiford 
Internal Migration Statistics for the United States. Everett S. Lee and Anne S. Lee 
Bibliography on Simulation, Gaming, Artificial Intelligence and Allied Topics 
Martin Shubik 








For further information, please contact: 
AMERICAN STATISTICAL ASSOCIATION 
1757 K Street, N.W. Washington 6, D. C. 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 22, Parts 3 & 4, 1960 
The addition of random seem... 


cy‘arviibe random variable 


aaa sytnmetry in multivariate normal 
On the r test in the intrablock analysis of a balanced incomplete block design .._.. 
An adminaible estimate for any sampling design 
Non null Ceataten of De eae wae S gaterin tl Goede 
Nonparametric linear estimation of common median of quae paaaene from symmetrically censored 
observations : Jobn = “Tobe 


Multivariate analysis: an indispensable research Cc. Radibaicriohes = 
Maximum likelihood estimation for positively regular Markov chains *, vs Bhat 
A stochastic model : : 


note on the structure of a certain : ‘ Das 
A short table of the hypergeometric function oF i(e; 2). j dévote i Edward ‘cen 
Approximate confidence interval for linear functions of means of K populations when the ulation variances 
are not equal Sea 
Measurement of attrition in technical education 
A note on the efficiency of — sampling for stratification 
On the standard error of the Lorenz concentration ratio 
On a problem of estimating increase in consumer demand 


Annvat Susscription: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Bacx Noumpens: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 
STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Caleutta 35, India 





ECONOMETRICA 
Journal of the Econometric Society 
Contents of Vol. 28, No. 4 - October 1960 


James Dursenserny, Orro Ecxstetn, anp Gary Fromm: A Simulation of the United States Economy in 
Recession 


Martin BronrensrenneR AnD THomas Mayer: Liquidity Functions in the American Economy 
Symposium: 
Cant Cunrtst: Simultaneous Equation Estimation: Any Verdict Yet? 
Currorp Hitprern: Simultaneous Equations—Any Verdict Yet? 
L. R. Kuxw: Single Equation Vs. Equation System Methods of Estimation in Econometrics. 
Ta-Cuune Liv: Underidentification, Structural Estimation, and Forecasting. 
Hrrorum: Uzawa: Market Mechanisms and Mathematical Programming. 


Surat N. Sneentvasa lyencar: On a Method of Computing the Engel Elasticities from the Concentration 
Curves. 

Daue W. Joncenson: A Dual Stability Theorem. 

Eart O. Heavy anv Joun Besex: Expansion Paths for Some Production Functions 

Aurrep Cow es: Notes Regarding Comments by Professor Holbrook Working on Cowles-Jones 1937 Econo- 
metrica Paper. 

Housroox Worx1na: Note on the Correlation of First Differences of Averages in a Random Chain. 

Epwiw Mri: A Note on Seasonal Inventories. 

Micuagt J. Brennan: Reply. 

Report or tue INDIAN Meerina. 

Boox Reviews. 


MATHEMATICAL ‘REVIEWS 


A Journal Containing 


REVIEWS OF MATHEMATICAL LITERATURE 
Pure and Applied 


of the Entire World 
With full Subject and Author Indices 


This journal is an indispensable tool for all those who need to keep up with new 
resenseh t in pure and applied mathematics. Includes extensive coverage of 


‘PROBABILITY and STATISTICS 
STOCHASTIC ‘PROCESSES & TIME SERIES 
THEORY OF STATISTICAL INFERENCE 
SAMPLING TECHNIQUES 
eANALYTICAL ‘PROBABILITY THEORY 


Subscriptions are accepted to cover the calendar ae -—_, 
Issues appear monthly 5% 

$50 per year 

$16 to members of the American Mathematical Society 


Send Subscription Orders to 


eAMERICAN MATHEMATICAL SOCIETY 
190 Hope Street, ‘Providence, ‘Rhode Island 





TECHNOMETRICS 
A Journal of Statistics 
for the Physical, Chemical and Engineering Sciences 
Vol. 2, No. 3 and 4, Contents, August/November 1960 


The Co pound Hypespgmetsto Distt of Single Sam; Inspection Plans Based on 
Prior Distributions Costs Hald; oie Hearn ote ution of the Bingle fempling 
ion Scheme ~ f, Wetherill; Serial Somaiins Dates tems Beays) Shewen 


Conclusions vs. Decisions J. W. T y 
mation from Life Test Data P- Bpasin: 
G. E. P. Boz and D.W Graphiesl Procedure for Fi 
of Tolerance-Limit Factors for Normal Distributions 
Methods of ets of M Lh 1. f-—) - he 
w y n Squares a 
puter R. C. Bose iM Chakravarti and D. t 


Technometrics is published quarterly in February, May, August, and November. The annual non-member 
subscription rate is $8.00. To members of the American Statistical Association and the American Society for 
Quality Control the rate is $6.00. Checks should be made payable to Technometrics and addressed to Tech- 
nometrics, Post Office Box 587, Benjamin Franklin Station, Washington 6, D. C. 








THE INSTITUTE OF MATHEMATICAL STATISTICS 
(Organized September 12, 1935) 


OFFICERS 
: Department of Statistics, U: . 
E. L. of niversity of California, Berk- 
iw’ 4, Calbels ” 


President-Elect: : 
A. H. Bowker, Department of Statistics, Stanford University, Stanford, 


Secretary: 3 
G. E. Ni Jr., Di Statistics, University of North Car- 
olin, Chapel Hill, North Carelina Ce ee 

Treasurer: 


ie ee ee 


‘William of 
SS oe Statistics, Eckhart Hall, University of 


Se ee ae ea Pree 


MATICAL ATICAL SEATIBTICS a 0.00 ieaies noses 











i 


TENTATIVE SCHEDULE 
EASTERN REGIONAL MEETING— 
Ithaca, New York, April 21-22, 1961 


Ss : 


| 
: 
, 
: 





